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SEQUENTIAL ANALYSIS OF TEST ITEMS 


JOHN SCHMID, JR. 
Michigan State College 
East Lansing, Michigan 


MOST METHODS of identifying discrimin- 
ating test items use some modification of the pro- 
duct-moment coefficient of correlation or some 
modification of the chi-square test. Among the 
more commonly employed indices are the Flanna- 
gan r, biserial r, tetrachoric r, and the phi coef- 
ficient. These indices are used for testing the 
hypothesis of independence between the test item 
and a criterion such as total score on the test. If 
the index attains some specified confidence level, 
the hypothesis of independence is rejected, and 
the test item is considered discriminating. Al- 
though these indices are variables, the numeri- 
cal value of the index does not contribute knowl- 
edge of the degree of discrimination, but only 
knowledge that the item does or does not discrim- 
inate within certain levels of confidence. Sequen- 
tial analysis of test items can provide this same 
information with minimum sampling of students’ 
responses, although the actual amount of work in- 
volved may not be materially reduced as compar- 
ed with the more conventional methods. It is the 
purpose of this paper to illustrate how sequential 
analysis may be used as an alternate way for sel- 
ecting discriminating items from a test for which 
item analysis data is available. 

Sequential analysis was developed as a way of 
improving quality control in industrial processes 
Products of a process are examined in a serial 
fashion until decision can be reached either to re- 
ject the entire lot of products as having too high 
a proportion of defectives or to accept the lot as 
conforming to specified standards. The applica- 
bility of sequential analysis is fairly general and 
it has been shown that it may be applied to some 
kinds of educational probleins (1,4). As applied 
to item analysis data, it involves setting up spec- 
ification values for accepting an item as being 
discriminating or rejecting it as non-discrimin- 
ating and, then, successively examining students’ 
responses to the item until one of the two speci- 
fication values is ascertained for it. This serves 
as evidence to deem the item discriminatory or 
not. 

By discriminating item is meant one whichcor- 
rectly classifies students previously classified 


according to some criterion. For example, if 
students may be classified as ‘‘pass’’ and ‘‘fail’’, 
then an item which classifies these students cor- 
rectly into these two groups is said to be discrim- 
inating. However, the establishment of the cri- 
terion of classification is a matter to be decided 
regardless of the method used to analyze items. 
Item difficulty and score on the whole test willbe 
used in this article to set up a criterion for anal- 
yzing items sequentially. 

The method of analyzing items sequentially as 
described here is not intended to prescribe the 
only procedures by which sequential analysis 
may be used with item analysis data, but rather 
how it may be used with the concepts of discrim- 
ination set forth here. Other theories of the na- 
ture of discrimination or validity certainly would 
call for some revision of the steps. 


We will consider that the difficulty of an item 
is indicated by the percentage of students failing 
the item. That is, an item of 32 percent diffi- 
culty means that 32 percent of the students taking 
the test score incorrectly on the item. If the 
item is perfectly discriminating, this 32 percent 
will be the lowest ranking students as determin- 
ed by their scores on the whole test. If, however, 
the 32 percent failing the item had been high rank- 
ing students, the difficulty index is inconsistent 
with the criterion, total score, and the item 
should be considered as failing to perform inthe 
manner it was intended. It seems that difficulty 
and discrimination are not separable and that 
the item is functioning properly only when the 
two are mutually consistent. However, it should 
be made elear that the total number of correct 
discriminations made by an item cannot be used 
as a workable index of discrimination. This may 
be seen by setting up a hypothetical example where 
the difficulty of the item is 90 percent. For sim- 
plicity, let us use ten students. We may rank 
the students from 1 to 10 on the basis of total 
scores, where 10 represents the highest score. 
In the following chart let 1, below the student’s 
rank, represent a correct response and 0 repre- 
sent an incorrect response. 
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Ranking of Students 


Discriminating Item 


Non-Discriminating Item 0 


In the case of a perfectly discriminating item, 
it seems that 90 percent difficulty implies a di- 
vision of students into two groups between the 
ninth and tenth high ranking students and, further- 
more, the nine low ranking students should have 
failed the item as shown for the Discriminating 
Item. Notice, however, that the Non-Discrimin- 
ating Item, also of 90 percent difficulty, fails to 
separate the nine low ranking students in this 
consistent manner, yet this item made 8 out of 10 
correct discriminations. It is the argument, here, 
that this item is no good, not by virtue of its hav- 
ing made 100 percent incorrect discriminations 
in the smaller of the two groups (in this case, the 
highest ranking student) as classified by the level 
of difficulty. In determining the discriminating 
power of an item, it is proposed that the number 
of correct discriminations only of the smaller 
percentage group as classified by the difficulty 
level should be considered. It is obvious that the 
further the difficulty index deviates from 50 per- 
cent, the more stringent become the demands for 
correct discriminations. This, of course, should 
be so. In the practical world of achievement test- 
ing, discrimination at high or low level of diffi- 
culty may be nearly unobtainable, but that is a 
problem for the test constructor. 

It may be seen, also, that this concept of item 
discrimination relates item difficulty and discrim- 
ination in a way that their mutual consistency be- 
comes important. Thus, this concept is similar 
to Guttman’s (5) scaling theory. 

Prior to the examination of the responses, in- 
spection specifications must be set up for accept- 
ing or rejecting an item. For purposes of illus- 
tration, suppose we decide that if an item tends 
to have only 10 percent defective responses when 
the smaller group of classified papers are exam- 
ined serially, we will consider the item suffic- 
iently discriminating. Suppose, also, we decide 
that if the item tends to have 40 percent defective 
responses, the item is failing to do a satisfactory 
job of discrimination and we will reject it. These 
two proportions will be designated as pp and p,. 
The first proportion, pp = .10, indicates our will- 
ingness to accept the condition that an item may 
not be perfectly discriminating but that 10 per- 
cent defective discriminations or less is suffic- 
iently small to warrant our acceptance of the item 
as discriminating satisfactorily. The proportion, 
p, = .40, indicates our belief that 40 percent or 
more incorrect discriminations is sufficient 
grounds for rejecting an item. 


It is possible for an item to be discriminating 
by the criterion of po and yet fail to tend to this 
value because of sampling error. Therefore, we 
must decide upon a level of risk, which we are 
willing to accept, for this occurrence. Because 
of the difficulty of constructing discriminating 
items, we do not want to reject discriminating 
items very often, and therefore we may arbitrar- 
ily decide that a low rejection risk value of . 01 
might be advisable. This value will be designated 
as w=.01. 

Moreover, there is the risk of erroneously 
accepting the item as discriminating when in real- 
ity the item is not discriminating by the criter- 
ion, p, = .40. This risk is designated as @ , and 
may be thought of as the number of times out of 
one hundred which, on the average, we will make 
this mistake. Retention of a non-discriminating 
item is not as serious as rejection of a discrim- 
inating item. Non-discriminating items generally 
are ‘‘deadwood’’ in a test and, except for wasting 
testing time, do not seriously influence test re- 
liability or validity. Hence we may be willing to 
use a higher risk value such as 6 = . 30 for this 
contingency. 

In summary, then, we specify that we will re- 
ject an item as non-discriminating only one per- 
cent of the time if the proportion of defective re- 
sponses to it is .10, and we will accept it as dis- 
criminating to the extent of having . 40 defective 
responses. 

That is, our specification values are: 


Po = . 10, 
Pp, = . 40, 


a =.01 
= .30 


The next step is to set up critical values for 
accepting or rejecting an item as discriminating 
or not for successive observations of student re- 
sponses. This is done by solving the equations 
for dm and gm. 


1-8. 


dm is the number of defective responses to 
an item, and 
£m is the number of good responses. 
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In order to simplify these equations, logar- 
ithms are taken: 


dn log + log igs = log ca 


1- Po 
d,, log | + log Ps | = log 
Using our values, these equations become: 


dm log 4+ gm log (.67) = log 70 
dm log 4+ gn, log (.67) = log(.30) 


Referring to a table of common logarithms we 
get: 


1.386 dy, - .405 gm = 4.248 


1.386 dy - .405 gm = -1.191 


The equations of these lines may be graphed 
in the conventional manner as shown below. These 
critical lines taken with their reference axes, gm 
and dy, will be called the specification chart. 
After the specification chart has been made, 
the difficulty of the item must be determined. 
This difficulty index then is used to divide the 
students into two groups: those ranking low who 
theoretically should fail the item and those who 
theoretically should pass the item in the correct 
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proportion of low and high test scores. For ex- 
ample, suppose the item difficulty is iound to be 
32 percent. This is used to classify all the pa- 
pers into two groups: the 32 percent lowest rank- 
ing students and the 68 percent highest ranking 
papers. Being the smaller group, the 32 per- 
cent lowest ranking papers are examined one by 
one to determine the proportion of defective re- 
sponses to the item. A response to an item is 
deemed good if the item is scored incorrectly and 
defective if scored correctly. Had the difficulty 
been 68 percent, then an item scored correctly 
by a student in the high 32 percent group would 
be considered a good response and an item scored 
incorrectly would be considered defective. 


The test analyzer now is ready to determine 
if an item is discriminating or not. The difficulty 
of the item having been determined in our example 
as 32 percent indicates that we snould take the 32 
percent low ranking papers and examine the stu- 
dents’ responses to it one ata time. If the first 


, Student has answered the item incorrectly, the 


item was good and gm = 1 is marked off on the 
horizontal axis of our specification chart. The 
second paper is examined. Again the student 
may have marked the item incorrectly, and the 
item again was good so a line is drawn from gm= 
1 to gm = 2 on the horizontal axis. Suppose the 
third paper shows the item marked correctly. 
Then the item was defective and dm = 1, soa 


Specification Chart 


| 
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vertical line of one unit is drawn at the point gm= 
2. This process is continued until our increas- 
ing broken line crosses either the higher or low- 
er of the critical lines. As soon as this occurs 
we have reached a decision that the item is either 
discriminating or not. As long as the broken line 
fails to cross either critical line, we have no 
basis for judging the item. In that case we must 
continue examining student responses until this 
occurs. 

In our example successive responses were 
as follows: g, g, d, g, d, d, g, g, g, g, d, d, d. 
(g indicates good, d indicates defective, ) After 
examining responses to an item on thirteen pa- 
pers, it was decided to reject the item as non- 
discriminating. If the papers of the group are 
exhausted before a critical line has been crossed, 
then we have no basis without further testing of 
students for concluding that the item discrimin- 
ates or not. 

The values po, p,, @ , and @ should be deter- 
mined by the analyzer in view of the kind of test 
he is analyzing and the purposes of the test. As 
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| Stated before, the values used in this article are 
not recommended but were used only for illustra- 
tion. 

In summary, then, the sequential analysis of 
test items involves the following steps: 


1. Setting up specification values and graphing 
the critical lines. 

2. Ranking of the test papers in order of test 
scores. 

3. Determination of the item difficulty. 

4. Division of the papers into two groups, using 
the difficulty of the item and the total scores 
as criteria. 

. Selection of the smaller percentage of papers. 

. Determining successively whether the respon- 
ses are defective or good, and marking these 
decisions on graph paper. 

. Continuing the analysis until the graph of the 
responses crosses a critical line, thereby in- 
dicating whether to reject the item as non- 
discriminating or to accept the item as dis- 


criminating. 
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A COMPARISON OF TWO METHODS OF IN- 
STRUCTION IN BEGINNING DRAWING 


CYRIL J. HOYT, CLAYTON L. STUNKARD, 
W. REID HASTIE, CLIFTON A. GAYNE, 
MILDRED M. PAGE and PAUL R. WENDT* 


University of Minnesota 


Introduction 


AN EXPLORATORY research program 
in art education was initiated at the University of 
Minnesota during the academic year 1940 - 1950. 
The Department of Art Education, jointly with 
the Bureau of Educational Research and with the 
cooperation of the Audio-Visual Department, 
set up an experiment designed to compare 
two methods of teaching drawing. One of the 
methods (Method 1) was essentially that describ- 
ed by Hoyt Sherman in his book, Draw- 

Seeing;! the other (Method 2) was the 
program customarily employed with beginning 
Students in Elementary Education at 
the University of Minnesota. 


Four classes of students majoring in elemen- 


tary education were divided into two sections each, 
one to be taught by Method 1 and the other by Meth- 
od 2. In all, a total of 132 students (110 females 
and 22 males) participated in this study. Prior 

to their enrollment in this course, these students 
had only the usual art training offered in elemen- 
tary and secondary schools. The manner in which 
the students in a given class were assigned to 
each of the two sections was strictly a random 
one. Using a table of random numbers, 2 a num- 
ber was drawn for each student. Students with 
even numbers made up one section and those with 
odd numbers the other. Table I represents the 
composition of these sections with respect to the 


number of males and females in each. 

The Ohio Psychological Test3 was administer- 
ed to all students before inaugurating the differ - 
ential treatment. Pre- and post-experimental 
measurements of ‘‘art judgment’’ were obtained 
by using the Meier Art Judgment Test.4 The 
Ohio and the Meier tests were chosen to serve as 
controlling variables in the analysis of the exper- 
iment. The outcome criteria were ratings on 
three drawings: a still life, a landscape, anda 
figure drawing. Pre-experimental control meas- 
‘ures were also obtained in these same three as- 
pects of drawing ability. As identical conditions 
as physically possible were maintained on both 
occasions for these drawings. A discussion of 
the rating procedure appears elsewhere in this 
manuscript. 


The Teaching and Rating Methods 


The two groups of students in each class were 
separated and a different treatment administered 
to each during the thirty-six hours of laboratory 
sessions over a six-week period. Careful atten- 
tion was given to'assure that the four sections of 
Method 1 would receive identical experience and 
that the fundamental pattern for Method 2 was the 
same in all sections. 


Method 1: Two locations were used. One was 
a section of the basement of the Museum of Nat- 
ural History. The other was a very large room 
in the Armory with a ceiling 30 feet from the 


* San Francisco State College, California (formerly at University of Minne- 


sota). 


Hoyt L. Sherman. 
1947). 


M. C. Kendall and B. Babington Smith. 


fracts for © ters, No. XXIV 
Press, 1939). 


Drawing by Seeing (New York: Hinds, Hayden and Eldredge, 


Tables of Random ling Numbers, 
(London, England: 


Ohio College Association Committee on Intelligence Teste for Entrance. 


Ohio State University Psychological Test, Form 22 (Columbus, Ohio: Ohio 
State University, 1943). 


Bureau of Educational Research and Service. Meier Art Tests, I 
Judgment, 1949 Edition (Iowa City: University of Towa, 1940). 
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floor. Both places provided black-out facilities, 
dimming lights, tachistoscopic projector, flood 
lamps, record player, screens and work desks. 

Each student was given, on an average day, 
twenty sheets of paper and large sticks of char- 
coal or chalk. The paper was clipped to work 
tables with large clamps to permit easy handling 
in a darkened room. 

White lights were turned off to permit the eyes 
of the students to become adapted to darkness. 
Music was started at the beginning of this period 
and was continued throughout the laboratory ses- 
sion. Informality, conversation, and free move- 
ment, were encouraged. After a ten minute in- 
terval, a signal was given to commence work. 
Following each slide the signal for the next came 
when the noise of paper turnover suggested that 
the class was ready for a new subject. 

The slides used for this drawing in the dark 
were photographic copies of the examples given 
in Sherman’s book. 5 A total of 94 slides were 
selected but by reversing these and adopting other 
techniques an increased number of drawing ex- 
periences was provided. Approximately 450 
drawings were made by each student during the 
program. Twenty different slides were shown 
each work period. The sequence of abstract 
Slides followed a pattern from simple to complex 
experience with emphasis in the following order: 
position, size, size and background, brightness, 
central vision and detail, slides used with flank- 
ing screens, and slides used with floor projection. 
Variations by reversal, review, and other meth- 
ods were controlled within this sequence pattern 
to increase the number of laboratory learning 
experiences. In addition, objects grouped in the 
flash area, flood lighted sections of room and 
posed and moving live models served as subject 
material for other drawings. A few draw- 
ings from slides of paintings by Cezanne and other 
artists were made. 

A flash duration of 1/10 second was establish- 
ed for the first days of the experiment, but this 
was gradually increased to 1 second, 1 minute, 

3 minutes, 10 minutes. Light in the work area 
was increased gradually until at completion of 
the program normal lighting was established. 


Method 2: The method was, in general, aser- 
ies of developmental experiences with evaluation 
and discussion after each drawing experience. 
This follows a pattern commonly used in many 
art classes. An important objective for this group 
was to develop art as expression. It then becomes 
a method of achieving greater satisfaction for the 
individual student and also provides the prospec- 
tive elementary teacher a channel through which 
she can approach children to understand and ap- 
preciate their expressions. 
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Certain assumptions were accepted as a base. 
These included the following: 


1. Anyone can be taught to draw. Drawing 
can be learned and it can be reduced to a 
simple system. 


. Drawing is an artificial language. 


. Drawing has a vocabulary systematized in- 
to a series of pictorial symbols. 


The emphasis in drawing activities was placed 
on symbolic communication through the selection 
of an interesting pattern of images and shapes 
that would carry an idea. In the evaluation and 
discussion the emphasis was directed toward an 
appreciation of an imaginative kind of experience. 

Drawing activities formed a sequence starting 
with the free organization of simple geometric 
solids into a composition using arbitrary (dram- 
atic) light and shade. These drawings (initial at- 
tempts at graphic representation) were mounted 
and presented for class discussion. Such ques- 
tions as the following were used to center atten- 
tion on the tools and principles an artist uses to 
communicate ideas: Which one do you like best? 
Why? What ideas are suggested by these simple 
abstract compositions? How is this accomplish- 
ed? 

The second activity in the sequence was initi- 
ated by a discussion of tools used to strengthen 
the communication of an idea by controlling visual 
elements suchas line, shade, tone, space, etc. A- 
gain, simplified geometric shapes were used, but 
the objective was modified to obtain visual connota- 
tion in symbols based on what these forms sug- 
gest from past experience. The discussion fol- 
lowing this emphasized the feelings and moods 
suggested by the drawings and how they were ex- 
pressed through a simplified handling of art el- 
ements and visual symbolism. 

The third drawing was a continuation of these 
ideas starting with an abstract idea as a theme 
and combining associated symbols in such a way 
that the idea is conveyed to an observer. The re- 
lationship to other art forms was brought out 
through the use of tape recorded readings of poetry 
of T. S. Eliot and Vachel Lindsay and of Gertrude 
Stein’s ‘‘Four Saints in Three Acts. ’’ In these, 
the sound pattern was an important factor in cre- 
ating feeling and an imaginative kind of experience. 

Next, a problem in still-life was approached, 
directing attention to the possibilities of a more 
personal approach through attention to space, col- 
or, design and a freer interpretation of the sub- 
ject. 

The series ended with figure drawing from 
memory and from a posed model. The key tothe 


5. Sherman, op. cit., pp.20-23. 
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design for 2 methods, 4 classes, 3 raters and 3 types of drawing. 
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Figure l. 
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CLASS 
1 2 3 4 
] Drawing Drawing Drawing Drawing 
1 23 123 3 
1 1 1 
| 1 t | _ 2 2 2 
. 3 3 3 
Method Drawing Drawing Drawing Drawing 
1 1 1 
; 2 t 2 2 2 2 
; 2a 3 3 3 
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TABLE Ill 


ANALYSIS OF VARIANCE OF MEIER ART JUDGMENT 
PRE-EX PERIMENTAL MEASUREMENTS 


Source of 
Variation .F. =z? M.S. Hypothesis* 


error 8802. 86861 86. 3026 ——= 
MxC 215. 17342 71. 7245 Te accepted 


Residual 9018. 04203 85, 8861 
M (method) 1 141. 50694 141. 5069 1. 648 accepted 
C (class) 3 160. 60778 53. 5359 . accepted 


*The hypothesis tested is the null hypothesis relative to the variation associated 
with the source indicated for that line. 


TABLE IV 


ANALYSIS OF VARIANCE OF OHIO PSYCHOLOGICAL TEST 
PRE-EX PERIMENTAL MEASUREMENTS 


Source of 
Variation .F. M.S. 


error 50, 546. 46377 495. 5536 


MxC 1, 249. 29075 416. 4303 


Residual 51, 795. 75452 493.2929 
M (method) 4.11587 4.1159 
C (class) 535. 06913 178. 3564 
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method for this group rested in the constant stress 
on joint participation in discussion and evaluation, 
the emphasis on expression of ideas, the means 
at hand to improve the visual pattern for the ob- 
server, and the constant relationship of the ex- 
perience background of the student to each new 
concept that was developed. References were 
made to examples of acceptable expression from 
the field of art history and to child expression 
and development. An ..\ttempt was made to have 
each student formulate his own understanding of 
what the instructor and student were trying to do 
in each lesson, why they were doing it, and how, 
through this understanding, the student might im- 
prove his own drawings. 

During the experimentation period, approxi- 
mately 6 large drawings were completed by each 
subject under method two in addition to a number 
of rough experimental sketches. The media (char- 
coal and colored chalk) were similar to that for 
method one. 


Rating Method for Drawings: As soon as a 
set of drawings used as pre- and post-experiment- 
al measures was completed, each drawing was 
examined individually by Bureau employees who 
removed identifying names and replaced these 
by consecutive numbers from 1 to 132 , selected 
in random order. Since each set of drawings 
was treated separately from the others, no stu- 
dent’s drawings would have the same identifica- 
tion number to represent that individual’s work. 
The Art Education Department-received the num- 
bered drawings at the end of the experimental 
period. Before the three raters were able to 
score the drawings, a period of time elapsed. 
Each of the raters worked independently, using 
his own method and criteria for judging. No con- 
ferences were held to standardize criteria or 
compare final ratings. 

Each rater had been requested to rate the 
drawings for a given set into nine groups whose 
frequencies correspond to those of the Stanine 
system.6 More than nine groups would be diffi- 
cult to handle, and fewer might result in toocrude 
a rating scale. Drawings placed in each group 
were given scores to correspond to the nine groups, 
with a unit score representing the lowest obtain- 
able and one of nine the highest. The nine rating 
groups and the number of drawings to be so clas- 
sified appear below: 


Rating 
1 


Frequency 


5 9 16 23 26 23 16 9 5 
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The general rating procedure employed by 
the three raters consisted in (1) arranging all 
drawings in numerical order on the floor, (2) ex- 
amining the drawings as a group, (3) selecting a 
number of the superior drawings, (4) ranking 
these as nearly as possible in rank order, and 
(5) separating them into the rating groups called 
for in the above schedule. The last three steps 
were next used on a number of the poorer draw- 
ings. Thus a system of working from the ex- 
tremes towards the center where the individual 
discrimination was not so great was used. After 
all drawings had been arranged in groups corres- 
ponding to the rating schedule, score assignments 
were recorded, and the drawings were arranged 
in the original numerical sequence for the next 
rater. 


Design of the Investigation 


The reader will observe that in relation to 
the number of females employed in this study, the 
males are decidedly in the minority. After con- 
sideration of possible sex differences (which the 
data from the study would be unable to elicit for 
reason of the limited number of males in the sub- 
classes) the decision was made to confine the an- 
alyses to data obtained on the females. 

One of the first questions to be answered con- 
cerned how the ratings of the drawings would be 
employed. Any one of the three types of draw- 
ings could be used individually as a variable (pre- 
or post-measure) or all three types of drawings 
could be utilized to yield a composite score of 
drawing ability. The desirability of employing 
a composite score in the evaluation of such an 
experiment is evident. 

A 2x 4x3 3 factorial design was used in 
the investigation of the main effects and interac- 
tions of four factors, namely: method (2 levels), 
class (4 levels), type of drawing (3 levels), and 
rater (3 levels). 

Since the ratings were made in a manner that 
permitted no mean differences between drawings, 
or raters to appear, the importance of these an- 
alyses was to ascertain whether either of these 
classifications interact with one another or with 
one or both of the otner two classifications. Of 
special importance is the interaction of rater clas- 
sification with one or more of the other three 
factors. The presence of such interactions would 
be indicative of unreliability of the rating proced- 
ure. Depending upon results obtained here, sev- 
eral different approaches would be indicated. 


1. If the rater and drawing classifications inter- 
act, it could be considered necessary to treat 


6. John C. Flenagan. “Units, Scores and Scales," in Educational Measurement 


(Washington, D.C.: American Council on Education, 


» Pe . 


| 
| 


March 1952) 


the ratings as nine separate variables. This 
would still be the case if, in addition, both of 
these classifications interact with one or both 
of the remaining factors. Under such condi- 
tions, drawing conclusions regarding the in- 

struction would be extremely difficult. 


. If the rater classification does not interact 
with that of drawing but does with either of 
the factors, method or class, or both, then 
the three ratings for a given rater may well 
be summed to yield only three separate stud- 
ent ability scores. Generalizations from such 
findings would, likewise, be difficult to draw. 


. If the drawing classification interacts with that 
of method and/or class factor, but not with 
rater, a summation of the three ratings of a 
given drawing produces three student scores, 
that is, three composite ratings, each repre- 
senting a measure of the different types of 
drawings. Reliable conclusions, however, 
can be drawn under such conditions without 
providing separate analyses of each of the com- 
posite measures. 


. Nine different variables would also be requir- 
ed when the rater and drawing factors do not 
interact, but each of them interacts witha 
third factor and one or the other or both with 
the fourth factor. Here again, reliable con- 
clusions concerning the instructional methods 
would be lacking. 


. Lastly, if there is no interaction of any kind 
among the four factors other than between 
method and class, it would be permissible, 
but not necessary, to add the nine ratings to 
obtain a single composite measure of drawing 
ability for each student. 


Above all, it is obvious that the results, or 
evaluation, of this experiment depend upon obtain- 
ing reliable ratings through the rating procedure. 
In this respect the present experiment is certain- 
ly not unique. 

The factorial design involves a split-plot an- 
alysis where the factors of method and class and 
their interaction are tested by a ‘‘whole plot’’ er- 
ror and the remaining effects and interactions are 
tested with a ‘‘split-plot’’ error. The preceed- 
ing design sets the analysis for the pre-experi- 
mental ratings, as well as the post-experimental 
ratings. In addition, the same design would func- 
tion for the analysis of variance of outcome meas- 
ures when controlling pre*test rating, provided 
that such an analysis of covariance is warranted. 
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A diagramatic representation oi this design is 
presented in Figure 1. The depth dimension rep- 
resents the individuals within the eight subclasses 
indicated by the solid block. Table II presents 
an outline of the analysis of the different sources 
of variance separable under this design. 

Any analyses carried out on the control vari- 


ables (Ohio Psychological Test and the MeierArt - 


Judgment Test) are set by the whole plot portion 
of the above design. This means that if analyses 
(of covariance) of the outcome ratings are to be 
made controlling one or both of these independent 
variables, they can be performed only with rela- 
tion to the whole plot design. 

Attention of the reader is directed to the fact 
that the set of nine ratings of the three drawings 
by the three raters representing either pre- or 
post-measures of drawing ability are made for a 
single individual rather than nine different indiv- 
iduals. For this reason, the split-plot analysis 
is the appropriate one rather than the usual fac- 
torial treatment. 


Analysis of the Data 


The symbols X, Y, Z, and Z, are employed 
in this report to indicate the following: 


X =: the rating on the pre-experimental draw- 
ings 

Y : the rating on the post-experimental draw- 
ings 

Z, : the Meier Art Judgment pre-experiment- 
al measurements 

Zz, : the Ohio Psychological Test pre-experi- 
mental measurements 


In order to comets the analyses, the approx- 


imate method of Tsao‘ was adopted for the anal- 
ysis of variance and covariance since the numbers 
in the eight basic subclasses departed but little 
from one another. Table I shows them to range 
from 11 through 16 with a median value et 14and 
a mean of 13. 75. 

The Tsao approximation requires no adjust- 
ment of the observed mean values while adjusting 
the observed deviational sums of squares and 
products within the subclasses by the factor of 
13. 75 times the reciprocal of the number of sub- 
jects for a particular subclass. 

The analysis of variance for the Meier Art 
Judgment pre-experimental measurements is giv- 
en in Table II. From this analysis, one observes 
no significant main effect or interaction. For this 
analysis, as well as all others included in the 
present investigation, the one percent level of 
significance is used for rejection of the hypothe- 


7. Fei Teao. "General Solution of the Analysis of Variance and Covariance in 


the Case of Unequal or Disproportionate Numbers of Observations 
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sis involved. These hypotheses are, in all cases, 
statements in the form of a null hypothesis. 

A similar analysis of variance for the pre-ex- 
perimental measurements of the Ohio Psychvlog- 
ical Test appears in Table IV. The results ob- 
tained are noticeably similar to those for the 
Meier Test. 

Both analyses above indicate that differences 
do not exist among the subclasses with respect 
to the two variables concerned. This, however, 
is not the situation for either the pre- or post- 
experimental ratings of drawing ability. Tables 
V and VI present the analyses of variance for these 
two variables. Wherever the probability associ- 
ated with the test of significance fur an interac- 
tion effect falls between .05 and .01, that source 
is not represented in the residual source, but 
ratner tested again with a different mean square 
representing the residual source. If this latter 
test does not indicate an associated probability of 
less than .01, the hypothesis is accepted. 

The analysis of variance of ratings of pre- 
experimental drawing ability exhibits two statis- 
tically significant interaction effects. One of 
these, the interaction of drawing with class, rep- 
resents a replication effect and indicates the ex- 
istence of mean differences in pre-experimental 
drawing ability between classes dependent upon 
the type of drawing. The average pre-experi- 
mental drawing ability over the three types of 
drawings does not vary significantly from class 
to class. 

The second statistically significant interaction, 
that between method and drawing classifications, 
is of special importance. It demonstrates differ- 
ential levels for the method groups dependent up- 
on drawing ability prior to the differential treat- 
ment of these groups during the experiment. Dif- 
ferences between methods one and two for the 
pre-experimental ratings of the still-life (draw- 
ing 1), landscape (drawing 2), and figure draw- 
ings are 0.561, 0.549, and -0. 033, respectively. 

Since these values represent differences obtained 
' by subtracting the mean for the second method 
from that of the first, one observes that the 
means for the group which subsequently received 
experimental treatment under method one are 
apparently greater than that for the group which 
received treatment under method two. This is 
the case only for the still-life and landscape 
drawings. The difference for the figure drawing 
is relatively unimportant in comparison with those 
for the other two drawings. 

One main effect and three interactions are 
shown to be statistically significant in the analy- 
sis of the ratings of post-experimental drawing 
ability. Of these, the main effect of class and 
two of its interactions with other factors again 
emphasize the effect of an need for replication in 
experimentation. Through the use of replication, 
one is better able to evaluate the experimental 
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factors, which in this case are represented by 
the two methods for the three types of drawings. 

The remaining interaction which was statistic- 
ally significant but does not represent a differ- 
ence in replicates is again that between the meti- 
od and drawing classifications. This time the 
difference between method one and two ior each 
of the three types of drawings are -0. 706, 0.213 
and -0. 020 for the still-life, landscape and figure 
drawings, respectively. Of particular interest 
is the inversion of method difference for the still- 
life drawing. Prior to the differential treatment 
of the experiment, the mean difference favored 
the subjects who were to be instructed under meth- 
odone. After the differential treatment of the 
experiment the mean difference favored the group 
which received experimental treatment under 
method two. Also, it is noted that such an inver- 
sion occurred only for the still-life drawing. 

As a result of the fact that ratings were obtain- 
ed separately for the pre- and post-experimental 
drawings, and then permitted to vary around the 
same mean for all measures, estimates of tne 
absolute change throughout the experiment can- 
not be made. Such absolute change is, in fact, 
of no extreme importance here due to the inabil- 
ity to place the measurement of drawing ability 
on any absolute scale. The efficacy of the train- 
ing under either method could, however, be noted 
by cursory examination of pre- and post-experi- 
mental drawings by the many subjects. Of prime 
importance here is the differential effects of train- 
ing, rather than an overall training effect. For 
this reason, the analysis of covariance provides 
the appropriate treatment for the data. 

In order to determine which independent var- 
iables were of any utility in assessing the differ- 
ential training effects, it was necessary to make 
tests of certain regression coefficients and their 
corresponding correlation coefficients. These 
tests are given in Table VII. 

By examination of Table VII one notes that in 
all instances (denoted in Table VII by tests num- 
bered 1, 4, 7, and 13) where the ratings of post- 
experimental drawing ability are regressed on 
the ratings of pre-experimental drawing ability, 
either neglecting or partialling out the effect of 
other pre-experimental measures, that the re- 
sulting regression coefficient is significantly dif- 
ferent from zero. Likewise, that in similar in- 
stances (denoted in Table VII by tests numbered 
2, 5, 10, and 14) where such post-experimental 
ratings are regressed upon the Meier Art Judg- 
ment Test measures, the resultant regression 
coefficients are not significantly different from 
zero. The Ohio Psychological Test measurement 
acts in the same manner as an independent vari- 
able (denoted in Table VII by tests numbered 3, 
8, 11 and 15) as does the pre-experimental rating, 
except to a lesser degree. 

Table VII also indicates that all the multiple 
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TABLE VII 


SIGNIFICANCE TESTS OF REGRESSION AND CORRELATION REPRESENTING THE RE- 
GRESSIONS OF RATINGS OF POST-EXPERIMENTAL DRAWINGS ON THE SEVERAL 
PRE-EXPERIMENTAL MEASURES, SINGULARLY AND IN COMBINATION 
(WHOLE-PLOT VALUES) 


Test Source of 


No. Variation .S. F Hypothesis 


1 by. x . 68994 , 247. 022 rejected 
Residual . 29402 


by. Z, . 14784 > 3. 009 accepted 
Residual . 83612 


by. Z, . 85856 9. 9997 rejected 
Residual . 12540 


byx. Z, . 79921 \ 244. 613 rejected 
byz,.x . 83540 3.015 accepted 


. 63461 > 123.814 rejected 


by. xz, 
Residual . 34935 


byx. Z, 30938 243. 728 rejected 
byz,.X 34627 8. 967 rejected 


by. xz 65565 126. 347 rejected 
Residual . 32831 


bYZ,.Z. . 05433 accepted 
byz,.Z, . 83015 rejected 


by. Z,Z 88448 rejected 
Residual 09948 


bYX. Z,Z, 1 . 87790 241. 386 rejected 
byz,.XZ, . 05400 ; 2.178 accepted 
Z_.XZ, . 16440 8. 161 rejected 


by. XZ,Z, . 09630 83. 908 rejected 
Residual . 88766 


Ly 1940. 98396 
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TABLE VIII 


TESTS OF SIGNIFICANCE OF THE DIFFERENCES BETWEEN CERTAIN 
ZERO ORDER AND MULTIPLE CORRELATION COEFFICIENTS 


Source of 


2 
Variation Ly M.S. Hypothesis 


RY. xz, 4. 94467 4.9447 accepted 
Dy (1-RY. 558. 34935 5. 5835 


RY. xz, “Tix 12. 96571 12. 9657 
Dy (1-RY xz.) 550. 32831 5. 5033 


RY xz,z, Tex 15. 40636 7.7032 
Dy (I-RY. xz, z,) 547. 88766 5. 5342 


Ly 563. 29402 


TABLE Ix 


A TEST OF SIGNIFICANCE FOR THE REGRESSION OF RATINGS OF POST- 
EX PERIMENTAL DRAWING ABILITY ON RATINGS OF PRE-EXPERI- 
MENTAL DRAWING ABILITY 
(SPLIT- PLOT VALUES) 


Source of 


Variation .F. Ly M.S. Hypothesis 


by. x 36. 93131 36.9314 rejected 
Residual 1146. 08033 1. 4062 


2 
Ly 1183. 01174 


276 (Vol. XX 


a 


March 1952) 


pazoafas 


00LS ‘0 
9€8T 
“ST 
99€0 
Se6e 


966€I 
82L9€ 
SE9EL “16 
LT€LO “OT 
86P0L “LE 
‘96TT 


S6ILb ‘0 
- 
8SS6L 
T9962 
LOETO 


TLOSP 
09066 
“61 
L8SZI 
SPIGI 
96TPL OBIT 


T2612 
z9Szs 
80L06 
“LE 
IT 
LLLSO “TE 
“LEZT 


(3uymerp) 
(49784) 

axw 

uxW 
axoxw 


8 TENpIsey 


paydasoe 


yqnop ut urewas 


paydas0e 


0€62 ‘ST 
LI8L‘T 
TS6b ‘02 
°S 
gees 
6690 ‘T 
2906 


€8L09 “0 
QOLLSL 16 
Sb069 “OT 
82066 ‘0b 
T8221 “OT 
LE 
OLSZS “ET 
€£080 


COYNNNCOT 


- 
092€6 
- 
8SS6L 
TOSLZ 
19962 “LI- 
88729 ‘0 
bSE98 902 


9bTHO 
090S6 
‘OT 
LT2Z0 ‘61 
L8SZI 
L8@SL 
SPIGI 
9TZ9b 
8hE69 “9 
69202 ‘8STT 


9169S ‘0 
80106 
SPPLE ET 
“LE 
IT 
ET 
peses 
LLLSO ‘TE 
6SSh8 “9 
PLITO 


COTNNNOCOW 


10118 


$990 ‘6b 
62 


L6OTT “O9T 
Lb990 ‘6h 
6LEES “88 
80662 


$9002 


86929 “SOT 
‘TE 
86996 
T8SL6 ‘OTEZ 


O€6bS “SLE 
8S9EZ 
61166 
96E86 ‘OF6T 


(sst12) 
(poyjyow) W 


stsayjodAH 


‘S'W 


KE 


paonpay 


az 


JO 


ALITIGV DONIMVUC IV.LNAWINad JO SONILVY ONITIOULNOOS ‘ALITIGY ONIMVUC 
JO SONLLVY JO AONVINVAOSD GNV AO SISATVNV ALATAWOO FHL 


X ATAVL 


277 
| | 
| | 
- | | 
Q 
| 


278 ; JOURNAL OF EXPERIMENTAL EDUCATION 


correlation coefficients representing the regres- 
sion of ratings of post-experimental drawing abil- 
ity on two or more pre-experimental measures 
are significantly different from zero. Table VIII 
points out that none of these multiple regressions 
account for significantly more variation in the 
dependent variable than does the pre-experiment- 
al rating alone. 

From these findings it was concluded that for 
subsequent analysis of whole plot measures of 
rating of post-experimental drawing ability, only 
a single independent variable was needed. The 
best single predictor has already been noted tobe 
that of the rating of pre-experimental drawing 
ability. Table IX presents the tests of signifi- 
cance involving these two variables for the split- 
plot measures. Here, also, the rating of pre- 
experimental drawing ability accounts for a sig- 
nificant proportion of the total variation among 
the ratings of post-experimental drawing ability. 

Lastly, Table X presents the complete analy- 
sis of variance and covariance for the ratings of 
post-experimental drawing ability when partial- 
ling out the effect of the ratings of pre-experi- 
mental drawing ability. There are six significant 
effects represented in it. After noting the pres- 
ence of replication effects, again, we limit the 
following remarks to the study of the factor of 
method and its interaction with that of drawing. 
They represent effects which are methodological- 
ly as well as statistically significant. 

The difference between thé adjusted values for 
the two methods is -0. 448 which indicates the su- 
periority of method two over method one. It 
should be noted that neither the analyses of var- 
iance (of ratings of pre-experimental drawing abil- 
ity and of ratings of post-experimental drawing 
ability) demonstrated the existence of a signifi- 
cant effect of method. 

Appearance of the interaction between method 
and drawing classifications shows that the extent 
of method difference varies over the three types 
of drawings. Values of -0.807, 0.115, and-0.014 
for still-life, landscape and figure drawing, re- 
spectively, focus attention on the conclusion that, 
without much doubt, the still-life drawing is ap- 
parently the only one for which a real difference 
in methods occurs. Comparison of its present 
magnitude with the unadjusted values points again 
to the inversion previously mentioned. This in- 
version is now more pronounced. 


Summary and Conclusions 
For the four factors—method, class, draw- 
ing and rater—studied in this investigation, no 
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main effects or interactions existed on the pre- 
experimental measures of art judgment or aca- 
demic ability as measured by the Meier Art Judg- 
ment Test and the Ohio Psychological Test. 

Although the analysis of ratings of pre-exper- 
imental drawing ability indicated that there were 
mean differences for the three types of drawings 
among the four classes, the average over the 
three drawings did not vary from class to class. 

There were, however, differential levels for 
the method groups before beginning the experi- 
mental instruction. These levels were dependent 
upon the type of drawing rather than actually rep- 
resenting an average difference. Comparisons 
of method differences for each of the three types 
of drawing—still-life, landscape and figure— 
indicated the groups which were to receive train- 
ing under method one to be initially superior to 
the groups to receive training under method two 
for ability to make still-life and landscape draw- 
ings. The two groups were apparently of equal 
ability on figure drawing. 

Subsequent to training, method differences ap- 
peared in the form of an interaction with type of 
drawing. Here, the difference for the still-life 
drawing has been reversed to favor method two 
and at the same time the differences for the land- 
scape and figure drawings have lessened. 


Tests of differences for the several multiple 
correlation coefficients against the coefficient of 
zero order correlation between ratings of pre- 
experimental and post-experimental drawing 
ability were computed. These demonstrated that 
the ratings of pre-experimental drawing ability 
alone needed to be considered as a control vari- 
able. 


Completion of the analysis of covariance pro- 
duced an overall method difference while none 
was observed for the rating of either pre- or 
post-experimental drawing ability. This differ- 
ence was in favor of the group which received 
experimental instruction by method two. An in- 
teraction of the method and drawing factors leads 
to the conclusion that the real method difference 
is for the still-life drawing. In fact, by control- 
ling the rating of pre-experimental drawing abil- 
ity this difference for the post-experimental rat- 
ing was increased. 

In conclusion, it may be said that, in all cases, 
method two produced drawing results either sup- 
erior or equivalent to that produced by method 
one. When considering additional factors, such 
as cost, this conclusion is of particular impor- 
tance. 
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THE APPLICATION OF DISPERSION ANALY- 
SIS TO A POLITICAL PROBLEM 


WILLIAM J. MOONAN 
University of Minnesota 


Introduction 


THE USEFULNESS of multivariate anal- 
ysis techniques in designing experiments is be- 
coming increasingly apparent to those who are 
actively participating in research dealing with ed- 
ucational, psychological and sociological data. 
Recently, Rao and Slater (20) have applied multi- 
variate techniques to a classification problem in- 
volving neurotic groups of soldiers. Unfortun- 
ately this article appears in a foreign journal 
which is inaccessible and/or unknown to many 
who should be aware of its contents. The purpose 
of this paper is to analyze a problem using meth- 
ods similar to those used by Rao and Slater and 
thereby to bring such techniques to the attention 
of a greater audience. 


The Origin and Development of Multivariate 
Theory 


The history of discriminatory analysis is no 
less unique in its transition than many other more 


common statistical methods in current use. Three 
fundamental periods, reminiscent perhaps of the 
history of the theory of statistical hypotheses, 
can be traced. Certainly, eras exist where the 
influence of K. Pearson, R. A. Fisher and the 
Neyman-Wald group are predominant. There have 
been others who have made significant and lasting 
contributions. The most notable of these other 
researchers are those representing the Indian 
School of statisticians, namely, P. C. Mahalan- 
obis, R. C. Bose, S. N. Roy andC. R. Rao. 

It was Karl Pearson’s Coefficient of Racial 
Likeness which was the first multivariate statis- 
tic to be studied. His ideas were introduced by 
Miss M. L. Tildesley (21) in a study using anthro- 
pological data. Briefly, the statistic was used 
to test the hypothesis that two samples could have 
been chosen from the same multivariate popula- 
tion. The data consisted of several measure- 
ments made on human crania. It had another, 
but unwarranted, use as a metric unit. Tildesley 
and others succumbed to the tantalization of using 
the empirical value as a measure of the distance 
between three or more populations considered 
pairwise. This practice had the effect of attrib- 
uting to the statistic features which it did not pos- 
sess. 

Early forms of the Coefficient of Racial Like- 
ness assumed that the variables involved were 
independent of each other and significance tests 


were carried out with large samples so that norm- 
al distribution theory would apply. A more gen- 
eralized form was introduced by Pearson (18) in 
1926 to allow for dependency. However, the sta- 
tistic suffered from a deficiency which amounted 
to the fact that it was not consistent when the pop- 
ulations being investigated were actually differ - 
ent. It was this aspect which prompted Mahalan- 
obis to define a new statistic, called D*, or the 
Generalized Distance, which was consistent un- 
der both the central and non-central case. 

At this point we note an interesting parallel 
to some history associated with ‘‘Student.’’ 
Mahalanobis pointed out his criticism of the Co- 
efficient of Racial Likeness to Pearson (see 15) 
who did not consider his argument whorthwhile. 
Undaunted, Mahalanobis (14) continued the inves- 
tigation and extension of D?and succeeded in find- 
ing the first four moments of its random samp- 
ling distribution for the case of dependent and in- 
dependent variables, with known variances—co- 
variances under central and non-central condi- 
tions. About the same time, Bose (1) obtained 
the distribution function of the classical D? and 
verified the truth of the results of sampling meth- 
ods employed by Mahalanobis. Immediately work 
was Started to eliminate the necessity of knowl- 
edge of the population variances and covariances. 
Bose and Roy (2) obtained the studentized distri- 
bution under all conditions of- centrality and de- 
pendency. This removed the last major hindrance 
to the practical use of D?. 

At the reading of this last paper, Fisher point- 
ed out that part of this work was redundant since, 
earlier, he had found the studentized distribution 
for the central case. Also, Fisher (6) showed 
that he, Mahalanobis and Hotelling were studying 
Statistics which had essentially served the same 
function and had similar characteristics. Hotel- 
ling (9) had been extending the work of ‘‘Student’’ 
and had defined a statistic, T?, which bears a 
direct proportional relationship to D? and which 
has a central distribution that is equivalent tothe 
z distribution obtained by Fisher in 1921. This 
Statistic is particularly useful in linear discrim- 
inant analysis since before a discriminant func- 
tion is set up, it must be determined if the two 
samples come from the same multivariate norm- 
al population. If they do, then there is nothing 
to be gained by trying to effect any discrimination. 

Fisher’s first published account of the linear 
discriminant function came in 1936. This statis- 
tic arises in the problem of trying to assign cor- 
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rectly a sample unit to one of two p-variate norm- 
al populations from which it could have come. The 
criterion for such assignment is based on maxi- 
mizing the ratio of the difference between the 
sample means to the sample standard error with- 
in the samples. By solving a set of simultaneous 
linear equations, it is possible to find the coef- 
ficients, Bj, of the function 


Y=A+ 
i 


which will provide the desired classification, if 
it exists. For a more elaborate description of 
the theory and calculations involved, see John- 
son (10). Many applications have been made of 
the linear discriminant function. There have 
been a few ingenious approximation methods to 
reduce the labor involved in solving the neces- 
sary equations. This problem is one whichis 
constantly a source of discouragement to anyone 
using multivariate analysis. Any matrix system 
which involves more than ten variables, practic- 
ally prohibits the use of a desk calculator to make 
inversions or to solve uniquely. Fortunately, the 
introduction of electronic calculators into the 
major universities will assist tremendously in 
the solution of these problems. Also, the major- 
ity of the problems will not have numbers of var- 
iables which are too great. 

There exists another limitation to the proper 
use of the linear discriminant function. Origin- 
ally this statistic was defined to classify two pop- 
ulations, but it has long been used for k>2. Its 
efficiency for this purpose is dependent upon the 
fact that the p means of the variables in the k pop- 
ulations are proportional. This assumption is 
known as collinearity and tests for this must be 
made if only one linear discriminant function is 
to be utilized in classification. 

The statistical intuition of Fisher prompted 
his definition of the linear discriminant function. 
This insight has preceded many important statis- 
tical developments, but such actions have often 
been unsatisfactory to those who wish to develop 
a theory of classification. A beginning in this 
direction was made in a brief note by Welch (24). 
The intention here is not to give a detailed de- 
scription of Welch’s results or those of von Mises 
(17), who elaborated upon his work. It suffices 
to say that they were interested in finding a rule, 
using the observations on p traits, to partition 
the sample space which characterizes the popu- 
lation from which a sample unit could have arisen. 
This work marked the introduction of the mini- 
max principle of statistical decision functions 
(see Wald, 23) into classification problems. Ex- 
tension of this idea, and others, have been given 
by Wald (22), Rao (19), and Brown (3). These 
papers have given, and are giving, impetus toa 
more complete classification theory whose bounds 
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are not currently near attainment. Johnson (11) 
has given a general discussion of the application 
of multivariate theory and a recent account of 
researches in multivariate analysis was report- 
ed by Johnson and Moonan (12). 


The Data and Its Analysis 


The data of this analysis relate to the first 
session roll calls of the United States Senate, 
80th session, 1947, and were obtained from a 
League of Women Voters publication (13). These 
data were factor analyzed by Harris (8) and we 
use some of his results. Factor analysis has 
served many controversial purposes in its short 
history. Least arbitrary of these is its ability 
to reduce abundant numbers of variables to hom- 
ogeneous groups. It is in this spirit that this 
technique has been used and not as an end in it- 
self. The fact that Rao and Slater also used a 
factor technique to reduce their basic data should 
not be construed that such a technique need al- 
ways (or ever) precede dispersion analysis. Sing- 
ular variables may be employed as components 
of the multivariate vectors and most probably 
would be. An interesting related problem, about 
which this author has seen no general discussion, 
concerns the resolution of a predicament that a 
researcher faces when designing an experiment. 
Should he use a variable as a dimension of the 
design, as a covariance adjuster or as a depend- 
ent multivariate? However, an attempt to answer 
this question will not be made in this paper. 

Before condensation into three variables, our 
data consisted of ten political bills selected for 
enumeration because of their importance: 


. Taft motion to second Overton motion toseat 
Senator Bilbo (passed 39 to 19). 

. Tydings amendment to continue the investi- 
gation of the National defense program under 
a standing committee rather than under a 
special committee (defeated 47 to 45). 

. Bricker motion to recommit the nomination 
of Lilienthal as Chairman of the Atomic En- 
ergy Commission (defeated 58 to 32). 

. Greek - Turkish Aid bill (passed 67 to 23). 

. Taft-Hartley labor bill (passed 68 to 24). 

. Kem amendment to cut foreign relief funds 
from $350 million to $200 million (defeated 
64 to 19). 

. Income tax reduction bill (passed 52 to 26). 

. A bill relating to rent control (passed 48 to 
26). 

. A bill providing price support and authoriz- 
ing increase prices paid for imported wool 
(passed 48 to 38). 

. Resolution to disapprove consolidating the 
housing functions of the government (defeat- 
ed 47 to 38). 
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Harris concluded that there existed three factors 
which we call A, BandC. He explicitly defines 
A and B, but suggests a dual meaning for C. The 
factors represent points of view of the Senators 
which he defines as relating to: 


A. Big business and management. 

B. The U.S. relationship with foreign coun- 
tries. 

C. The protection of home industries or the 
interests of agriculture as opposed to in- 
dustry. 


Note should be taken of the fact that bill 1 was 
not included in the factor analysis since too few 
Senators were seated when the motion was intro- 
duced. In its place we substitute the party affil- 
iation of the Senator—either Democrat or Repub- 
lican. As Senator Bilbo was not seated, only 95 
Senators are included. 

The first factor, A, was made up of bills 2, 

3, 5, 7, 8 and the political party. The second 
factor consisted of two bills 4 and 6, and C orig- 
inates from 9 and 10. The factorization and the 
designation of its results are somewhat arbitrary. 
This is true in lieu of the possible failure of the 
variables to fulfill the assumptions of tetrachoric 
correlation, the lack of uniqueness of the anal- 
ysis and the semantic difficulties. As this paper 
is an illustrative exposition of what can be done 
with data of this type by utilizing powerful statis- 
tical methods, not too much serious objection 

can arise if we proceed as intended with the math- 
ematics while not stressing any validity for the 
results. 

We have been given observations on each of 
three variables obtained from votes of Senators 
on certain bills. We group the Senators into four 
sections on the basis of the geographical location 
of the states they represent. This segregation 
was accomplished by utilizing no known biases. 
States which could be assigned to two or more 
sections were assigned randomly. Missing ob- 
servations were also filled in randomly. Let 
Xjjq = the ath observation on the ith factor of a 
Senator from the jth section. 


i=1, 2, 3;j=1, 2, 3, 4; 


4 
N=N, +N, +N3+N, 


In this analysis we assume that variances and 
covariances of the factors for each section are 
equal. A test of this hypothesis is given by Wilks 
(25). We now make the hypothesis, ij - yij' =0 
for alli. The object of this hypothesis is to es- 
tablish that the sections differ on the factors, 
otherwise there is no reason to measure the di- 
vergences of the sections from each other andto 
set up discriminants between them. The likeli- 
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hood ratio for testing this hypothesis is 


w= |{Ain}|_ Determinant of the ‘‘Within 
Determinant of the “Total™ 


Sections’’ matrix 
matrix 


(1) 


where W is the ratio of two Wishart’s distribu- 

tions. We will use a limiting test suggested by 

Bartlett which makes use of the X? distribution 

for large values of the degrees of freedom for 
Ain}. The analytical definition of the matrices 
volved in the calculations are: 


4 (Nj 
{ain} = 2, ~ Xry) 


i,h=1, 2, 3 (2) 


a N 
{Cin} = ai (Xija Xn) 


i,h=1, 2, 3 (3) 


In this notation, Xjj is the mean of the ith factor 
in the jth section and Xj is the mean of the ithfac- 
tor in all sections. The numbers in the i factors 
are the scores compounded on each of the bills 
(or party) in each of the factors. As an illustra- 
tion of the data, consider the responses of a Sen- 
ator from Alabama (see page 285). 

A score of 1 was assigned to each bill that the 
Senator voted for, otherwise a 0 was entered. If 
he was a Republican, he was given al also. The 
total scores for each factor appear as totals un- 
der the entry Xj11, i.e., the first score of a Sen- 
ator from the first section (South) on the ith fac- 
tor. In the example the values are 2, 0 and 1. 
These scores were calculated for each Senator 
for each section. From their values it is poss- 
ible to obtain easily the matrices {Cjn} and {Ajn}. 


We find that 
-26. 2211 
- 5.9053 (4) 


15. 3053 51. 7263 


( 321. 6211 15. 3053 
Cint= 
-26.2211 - 5.9053 23.6211 


and 


231.3656 15. 5361 
{Ajn}={ 15.5361 50. 7285 


-24. 4091 
- 6. (5) 
-24.4091 - 6.0985 


22.5379 


Previously we agreed to test the hypothesis, pjj= 
nij' for all i by a limiting X? test. The ratio, 


= 246340 = .6731 w. 
{ain }l / Kcin}| 
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Bartlett's test evaluates X* as, X?=-n-1/2 |(p+ 
q+ 1)] loge W, with pq degrees of freedom where 
n is the number of factors, and q is the number 
of degrees of freedom for ‘‘Between Sections’’. 

In our particular example we have n = 94, p = 3 
andq=3. Therefore X3 = 35.825 with 9 d.f. 
This value is significant at the 1% level so we re- 
ject the hypothesis. This result is equivalent to 
saying that the sections of the country differ in 
respect to the factors. An inquiry into the nature 
of this difference will now be undertaken. 


The Calculation and Interpretation of the General- 
ized Distance 


A brief history of Mahalanobis’ Generalized 
Distance Function was given in an earlier section 
of this paper. Having established in these notes 
the concepts of D?, we shall now embark upon its 
calculation, remembering that our purpose is to 
find a metric unit denoting the divergence between 
the sections of the country. This aspect utilizes 
D? as a classification statistic because, in effect, 
we determine the relation of a population to a set 
of other populations. 

The definition of D? is usually given as 


3 4 


The meaning of Xij (and thus, Xpj) has been set 
forth previously, and Sih js simply ‘the inverse 
matrix of {Sin} = {Ajn} /91. Rao (16) has shown 
a method whereby the inverse need not be calcu- 
lated, but for a small number of variables, as we 
have here, it may as well be. This author finds 
it convenient to find D? by means of matrix mul- 
tiplication. This method will now be shown. We 
-. 08145 


calculate, 
gi! gt? . 46272 
S21 $22 g237 = (- 1. 86898 . 41755 ?(7) 
$31 §32 .41755 4.65181 

D?’s are found between two sections at a time. 

Let us calculate the distance between the East 

and Midwest. We need for this a summary of the 

means of each of the factors. This information 

is provided in Table I, 


From the data of Table I, form the {dj} and 
{ dy} matrices: 


{dj} = 


90.2555 - 6( 231. 3656) 
- .2308- 15.5361) 
1.8120 - @(- 24. 4091) 


. 44761 
08145 
46272 


{dn} = {-.7841 -.2651 
-11288} (8) 
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-. 2308 - 6( 15.5361) 
9978 - 50. 7285) 
1932 - 6(- 6. 0985) 


(Vol. xX 


whose elements are the differences Xjj - Xij', 

for i= 1, 2, 3 andj + j'. Having found{dj} , fine 
the product matrix, ( sth} (dj } and from it calcu- 
late D? which equals dp Sih dj . In our case 


we find 
- .38898 
{sih} {aj} = - 48538 (9) 


-1. 07266 


and D? = .5718. This arithmetic is especially 
simple with a Friden full-automatic calculating 
machine. The D?’s for all combinations of pairs 
are given in Table II. The significance of these 
values may be tested by using the formula, 


Nj+Nj' m p (10) 


where m is the number of degrees of freedom of 
{Ain}, all other terms being previously defined. 
For the East-Midwest comparison, 


F (22:24 .5718 = 2.14 


which, for d.f. 3 and 89, is not significant at the 
5% level. The F’s for all combinations of sec- 
tions are shown in Table II. 

Only two of the six distances are not signifi- 
cant at the 5% level. This shows that the Eastern- 
Midwestern and Western-Southern political sent- 
iments are similar in regard to the issues invol- 
ved in this study. From Table II it is apparent 
that the sections are widely separated. If there 
only one component which accounts for the diff- 
erences between the sections we could draw itas 
an axis upon which points representing the sec- 
tions could be noted at appropriate distances from 
each other. The sections which differ from each 
other would do so in proportion to the degree of 
separation. We have no reason to believe there 
are not more components, however. If this be 
the case, then our graph would be more dimen- 
sionalized and a single component would not be 
appropriate. 

Fisher (7) has provided a procedure to test 
whether a single component is warranted. A 
function is fitted so that it gives the line of best 
fit to the means of the sections in the three spaces 
provided by our factors. This line has three dir- 
ection cosines which need to be found. To this 
end we must solve the characteristic equation of 
the following determinant: 


-1. 8120 - 6(-24. 4091) 
1932 - 6(- 6.0985) | =0 
1. 0832 - 6( 22.5378) 


(11) 
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TABLE II 
SUMMARY OF D?’s AND (F) VALUES OF ALL COMBINATIONS OF SECTIONS 


Section Midwest South West 
East . 5718 (2.14) 1.1746 ( 4. 79)** . 9381 (3. 20)* 


Midwest 2.6169 (11. 21)** 1.7912 (6. 37)** 


South . 3556 (1.37) 


* Denotes significance at the 5% level. 
**Denotes significance at the 1% level. 


TABLE II 


MEAN VALUES OF THE CANONICAL VARIATES, 
, AND A, FOR EACH SECTION 


Canonical Variates 


. 3352 - 1923 


. 3988 . 2329 
2395 2123 
. 2671 . 2493 
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The elements of (11) are found by subtracting @ 
times the matrix of (5) from the difference ma- 
trix of (4) and (5). This last matrix, obtained 
by differencing, is the ‘‘Between Sections’’ ma- 
trix. Finding the characteristic equation of the 
determinant of (11) is computationally long if 
done directly. It is simpler to premultiply the 
‘‘Between Sections’’ matrix by the matrix (7) and 
subtract this matrix; 


916 0 0 
0 916 0 (12) 
0 0 9186 


This action leads to the following determinant, 
wherein p is equal to 91 @. 


39. 5796-p 
8. 5393 
33. 2358 


~. 0952 
1. 9643-p 
1. 2086 


-. 3256 
-9610 | =0 (13) 
4. 2811-p 


Expanding (13) by the usual methods, we get the 
characteristic equation, 


m?-45. 8250 u2+266. 7699 p - 304. 9653. 


The roots may be determined by a variety of 
methods and the author used a trigonometric 
scheme shown in Dickson’s book (5). The roots 
are: 39.2219, 5.0673 and 1.5358. The sum of 
the roots is 45.8250. This value has pq = 9 de- 
grees of freedom and constitutes their total var- 
iation. Considering this as a X*, we have an e- 
quivalent large sample test for the hypothesis, 
which we earlier rejected. To test the 
hypothesis of collinearity we split the total vari- 
ation into two groups. One corresponding to the. 
variation of the largest root and the other is made 
up of the sums of the smaller variations. Thus 
45. 8250 = 39.2219 + 6.6031 and the distribution 
of the degrees of freedom is pq = (p +q- 1)+ 
[(p+q-3)+(p+q-5)]=5+4. Treatingeach 
term as a X?, we find that only the last, or res- 
idual, term is not significant. This establishes 
that there is only collinearity. If this last X? had 
been significant, this fact would be indicative of 
at least two dimensional coplaniarity. The tests 
may be extended to determine what degree of cop- 
laniarity is justifiable. 

Our calculations have shown that only the first 
root is significant. Even so, we shall proceed 
to calculate canonical variates corresponding to 
two roots. This act serves two purposes. We 
get a better description of the nature of the data 
and a suggestion for a further investigation. 

In order to find the two canonical variates, or 
axes of the dimension space, we find the direc- 
tion cosines corresponding to the first two roots, 
Standardize them, and compound them into linear 
functions of the factors. The directions cosines, 
ki, for the first canonical variate are found by 
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solving the following system of equations: 


(39.5796 -39. 2213) k, - .0952 k,-. 3256 ky = 0 
-8. 5393 k, + (1. 9643-39. 2213) . 9660 = 0 
33.2358 k, + 1.2086 + (4. 2811-39. 2213)k, =0 


(14) 


The terms of (14) are comprised of the elements 
of (13) from whose diagonal terms are subtract- 
ed the first root. As we need be only interested 
in proportional solutions, k,; may be set arbitrar- 
ily equal to unity. Resultant calculations show 
k,= 1. 0434 and k,= .2168. The variance of the 
canonical variate, k,A + k,B + k3;C, is 


3 
= 
i=1 h=1 


Ain kj = 249. 3095. 


ividing each term of the first canonical variate 
by the square root of this variance, we obtainthe 
first standardized canonical variate, \, = . 0661A+ 
.0137B + .0633C. By substituting the second 
root, 5.0737 for 39.2213 in (14) and solving as 
before, we determine the second standardized 
canonical variate to be, », = .0018A + . 0636B + 
. 1726C. 

If we substitute the mean values on the three 
factors for each of the sections in \, and »,, we 
obtain the entries of Table II. 

Figure 1 shows a plot of these means and pic- 
torially represents the data in the two dimension- 
al standardized canonical variate space. Having 
found the canonical variates, it is desirable to 
name them if possible. It is at this point that the 
troubles faced by a factor analysist are met. In 
this case, however, we have an advantage of know- 
ing which canonical variates are worth consider- 
ing and, besides, we have equations for them. 
Embodied in these equations are the values of the: 
standardized direction cosine weights. These val- 
ues help us. In the particular case we are consid- 
ering, any score on >, is primarily made of con- 
tributions from factors A and C, while those of 
»g are comprised mostly of C with some contri- 
bution made by B. We must keep in mind, how- 
ever, the magnitude of the numbers t be substi- 
tuted for A, Band C. On the average, they will 
be about 3.5, .5 and 1.0. Thus even though the 
standardized direction cosines for A and C of >, 
are bout equal, the contribution due to A will be 
tke greatest. This evidence, and that provided 
by the nature of the bills in A, suggests that the 
first standardized canonical variate is concerned 
primarily with an attitude about government reg- 
ulation and control of economic and social life. 
The Midwest is most favorable and the South least 
favorable to such administration. Notice that all 
sections made positive registrations on this var- 
iate. 

Recall that the variation due to the second root 
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was not Statistically significant. We have carried 
it along for descriptive purposes. Also the sec- 
ond canonical variate is difficult to name. The 
trouble seems to stem from the fact that bills 9 
and 10 appear to have no common basis. Scrut- 
iny of the original data, leads to the conclusion 
that the West and Midwest are most favorable 
toward bill 9. The second variate may be an at- 
titude concerned with government subsidy or there 
may actually be two components embodied in it. 

Even though it was very insignificant, curios- 
ity led the author to the evaluation of », = .0001A- 
. 1988B + .0565C. Substituting the means of fac- 
tors, we get East: -. 0089, Midwest: -. 0543, 
South: -. 0529 and West: -.0186. Factor B is ap- 
parently most important. As this factor is made 
up of bills 4 and 6 which relate to foreign aid, 

»3 is thus characterized. The order of the sec- 
tions is certainly congruent with common knowl- 
edge. The coastal areas are generally known to 
be most favorable and the Midwest least to foreign 
aid. Actually the designation of \,, 2 and >, is 
not all important. These equations may be con- 
sidered as ancillary functions which maximize 
the information inherent in the variability of the 
data and their specification need not be, or may 
not be, politically meaningful. This specifica- 
tion, however, is usually psychologically satis- 
fying to the researcher. 

Rao (20) has shown how to establish a criter- 
ion to determine which section a Senator belongs 
to by using the canonical variates. The establish- 
ment of such a criterion is not so important in 
this problem unless one would desire to ascer- 
tain whether a politician aspiring to election 
would represent the section of the country of 
which his state is a member. The method will 
be shown because it is very useful in more com- 
mon situations. 

The procedure we follow is to calculate the 
discriminants 


3 
Lj=1, A+1,B+1;C - 1/2 1;Xij + loge Tj 
(15) 


where 


1; = sih Xij , =1, 2,3 (16) 
and TI; is the relative frequency of the number of 
Senators in the jth section. These values areas- 
sumed to be accurately known, and they are in 
this case, since the number of Sesnters from the 


states are fixed by law. 


1, 
1, 
1; 


44761 
- 08145 
- 46272 


- 08145 
1. 86898 
- 41755 


- 46272 
- 41755 


We have for the Southern Seva, 
4.65181 
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while 1/2 = 14 Xij = 7.6750 and Tl, = 22 = 
i=l 96 

. 3125 thus Lg = 1. 5600A + 1.2408B + 6. 0629C - 
6.5272. The discriminants for the other sections 
similarly calculated are: Ly = 1. 7727A + 1.1390B 
+ 7.2751C - 8.8095, Le = 2.2245A + . 6641B + 
6.4428C - 9.1482, and Ly = 2. 6359A + 1. 1454B + 
7.5386C - 12.3074. Given a new random obser- 
vation on the three factors we substitute in the 
Lj and find the highest numerical value. The sec- 
tion corresponding to this number is that section 
to which the observation should be classified. 
The criterion of classification is that which max- 
imizes the minimum probability of misclassifica- 
tion. 

Although this practice is not defensible, we 
will, as an example, substitute in the Lj the ob- 
servation of the Senator from Alabama whose 
scores were previously given as A = 2, B = 0, 
and C = 1. Substitution yields, Ls = 2. 6557, Lw= 
2.0110, Le = 1. 7436 and Ly = .5030. As Ls was 
the maximum, the observation is classified as 
Southern. Most misclassifications are liable to 
occur between the sections where Generalized 
Distances are not significant. 


Summary 


This paper was concerned with the dispersion 
analysis of some political data. It was ascertain- 
ed that designated sections of the United States 
differed significantly with respect to certain fac- 
tors whose elements consisted of votes of Sena- 
tors on nine bills and their political party. Gen- 
eralized distances, calculated between the sec- 
tions enabled a designation of how far the sections 
departed from each other. Certain distances 
were found insignificant. A canonical analysis 
was undertaken and it was determined that the 
variation around a linear function through the 
means of the factors for each section was such 
that an hypothesis of collinearity was accepted. 
Even so, the two dimensional analysis was con- 
tinued for descriptive and suggestive purposes. 
The canonical variates were described and named. 
A final analysis was made to establish a criter- 
ion to classify an individual to a most appropri- 
ate section. 

The author is indebted to Rao and Slater’s or- 
iginal work and to the suggestions afforded by a 
factor analysis made by Harris. Dispersion an- 
alysis offers useful and unique techniques for re- 
searchers in the social sciences. A similar an- 


alysis to this one might be undertaken with the 


2.5517 1. 5600 
.55175 = 4 1.2408 (17) 
1.0000 6. 0629 
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international hierarchies suggested by Cattell (4) 
using the political data generated by the United 
Nation's General Assembly and/or the Security 
Council. The results of such analysis would 
prove both instructive and interesting. Both psy- 
chology and education abound with classification 
problems which need to be attached with these 
rigorous techniques. This article in itself should 
be suggestive of other uses. 

A history of multivariate analysis was given 
in an introductory section of the paper. 
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A COMPARISON OF THE CONVENTIONAL 
AND DEMONSTRATION METHODS IN 
THE ELEMENTARY COLLEGE 
PHYSICS LABORATORY 


HAYM KRUGLAK 
University of Minnesota 


1. The Problem and Its Significance 


THE OBJECTIVE of this experiment was 
to compare the learning outcomes of two instruc- 
tional methods in the elementary college physics 
laboratory: the individual and the demonstration. 
The experiment was carried out so as to test the 
following three null hypotheses: 


a. There are no differences in learning out- 
comes of students who perform laboratory 
experiments in elementary college physics 
by the conventional method and students for 
whom the same experiments are demon- 
strated by the laboratory instructor. 


. There are no differences in learning out- 
comes of students in elementary college 
physics that are traceable to the differences 
among the laboratory instructors. 


. There are no differences in learning out- 
comes of students in elementary college 
physics that are assignable to the interac- 
tion between the instructors and the meth- 
ods. 


Though there have been many investigations 
dealing with the lecture—demonstration versus 
the individual method of laboratory instruction, 
the problem of the most appropriate method is 
far from solved. The evaluation of learning out- 
comes is particularly important in the elemen- 
tary college physics laboratory. Considerable 
personnel, time, space, and apparatus are allot- 
ted to the operation of the undergraduate labora- 
tories, yet the research on their instructional 
effectiveness is pra¢tically non-existent. A sur- 
vey of the literature shows that most of the relat- 
ed studies had serious flaws in experimental de- 
sign and analytical treatment so that no valid con- 
clusions could be drawn from the published data. 


2. The Design of the Experiment 


A 2 x 4 randomized block with equal subclas- 
ses was used to obtain the answers to the ques- 
tions raised by the problem. Four instructors 


were selected with each assigned at random to 
one control and one experimental laboratory sec- 
tion. The control groups were taught by the con- 
ventional method; the demonstration method was 
used with the experimental groups. The students 
were assigned to the control and experimental 
groups by the random sampling number technique. 
Thus the design satisfied the criteria of replica- 
tion, control, and randomization—the essential 
requisites of a self-contained experiment. 

It is a well recognized fact among statistic- 
ians that the amount of information obtainable 
from a set of data and the validity of the conclu- 
sions based on the data are unique functions of 
the experimental design. The analysis of vari- 
ance and covariance were the appropriate tech- 
niques for testing the null hypotheses of the study, 
fixed by the design of the experiment. 

The analysis of the experimental data invol- 
ved three basic steps: First, the calculation of 
the L, function for the test for equal variability 
within the laboratory sections, an assumption un- 
derlying the technique of the analysis of variance; 
second, carrying out the analysis of variance, 
which consists in breaking down the total sums 
of squares and assigning each component to its 
appropriate source. These components with the 
corresponding degrees of freedom and the result- 
ing mean squares are then recorded in an analy- 
sis of variance table. And third, the interpreta- 
tion of the results in terms of the null hypotheses. 

The analysis of covariance provided the means 
for controlling the effects of the students’ mathe- 
matical background, mental aptitude (ACE), and 
previous knowledge of the field as measured by 
the respective tests used. This is essentially a 
method whereby the subjects can be equated on 
the measures of initial status by mathematical 
adjustments. 

It was decided at the time the experiment was 
being designed to set the level of significance at 
the five percent level. 


3. The Evaluation Instruments 
Information on the initial status of the stud- 


ents was obtained by means of a special Person- 
al Data Form, the American Council on Educa- 
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tion Psychological Examination, and a series of 
pre-tests: mathematics, laboratory written, 
short-item laboratory practical, and long-item 
laboratory practical. The achievement in phys- 
ics at the end of the quarter was measured by 
several post-tests: mechanics theory, laboratory 
written, short-item laboratory practical, and 
long-item laboratory practical. The post-tests 
were identical to the corresponding pre-tests. 

The information from the Personal Data Form 
was used to assign students to the experimental 
and control sections, an‘ to ascertain the char- 
acteristics of the subsamples. The raw scores 
on the ACE Fsychological Examination for Col- 
lege Freshmen were used as measures of the sub- 
jects’ mental ability. 

A 30-item, 30-minute, mathematics test was 
constructed by the writer, designed to sample the 
mathematical understandings and skills needed 
in elementary, non-technical physics. The Hoyt! 
reliability of the test was 0.75, significant at the 
1 percent level. One of the test items is repro- 
duced below. 


( ) 26. In the equation P = K MN/S? what will hap- 
pen to the value of P if N and S are doubled? 


(1) P will remain unchanged 

(2) P will decrease to one-half of the orig- 
inal value 

(3) P will be quadrupled 

(4) P will become one quarter-of the orig- 
inal value 

(5) One cannot predict the effect on P with- 
out knowing the specific value of.allthe 
quantities 


The student’s knowledge of elementary mech- 
anics at the beginning of the quarter was evalu- 
ated by means of a 33-item, 45-minute test on 
vocabulary, principles and applications of elem- 
entary mechanics. The items were selected so 
as to correspond to the topics in the course from 
the Cooperative Physics Tests for College Stud- 
ents, Forms C, D, E, F. The test was found to 
be unreliable by the Hoyt method as a pre-test; 
the reliability of the post-test was 0.56, signifi- 
cant at the 1 percent level. One of the test items 
is reproduced below. 


() 19. A man weighing 160 lb is in an elevator 
that has an upward acceleration of 16 ft/ 
sec2. The upward force exerted on him 
by the elevator floor is about 


(1) 80 lb 
(2) 160 lb 
(3) 176 lb 
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(4) 240 lb 
(5) 320 lb 


The laboratory written test was designed to 
measure some of the specific outcomes associ- 
ated with the work covered in an elementary mech- 
anics laboratory. The 30 items in the test were 
based primarily on the laboratory experiments 
scheduled for the Fall quarter, 1950. The admin- 
istration time for the test was 45 minutes. The 
pre-test was found to be unreliable by the Hoyt 
method for the 64 subjects in the experimental 
and control groups. The Hoyt reliability of the 
post-test was 0.51, significant at the 1 percent 
level. One of the test items is reproduced below. 
() 5. In the experiment on the free fall of a body 
under the gravitational acceleration g, a 
stylus attached to the falling body vibrates 

- with a frequency of 120 vibration/sec. as 
the body falls, thus tracing out a wavy line 
on a strip of waxed paper. One can then 
make calculations of the velocity of the fal- 
ling body and on its acceleration during the 
time of fall. Suppose that (a) the stylus 
actually vibrates 119 times per secondand 
the student uses the value 120 v. p.s. and 
(b) the meter stick used in taking meas- 
urements is slightly longer than one met- 
er. Which of the following statements will 
be true. 


(1) Both effects (a) and (b) tend to increase 
the calculated value of g. 

(2) Both effects (a) and (b) tend to decrease 
the values of g. 

(3) (a) tends to increase the value of g while 
(b) tends to decrease it. 

(4) (a) tends to decrease the value of g 
while (b) tends to increase it. 

(5) Neither (a) nor (b) will have any effect 
on the value of g. 


The short-item laboratory practical test was 
devised to measure the understanding of phys - 
ical principles in terms of apparatus setups, the 
possession of a few manipulatory skills and tech- 
niques, and the ability to solve simple problems 
involving the instruments and materials common- 
ly found in a mechanics laboratory. The test con- 
sisted of 18 performance items (stations), with 
a total of 35 possible responses. The verbal de- 
scription of the given apparatus and the problem 
were typed on a 4 x 6 card, placed near the appar- 
atus. The students were assigned at random to 
Stations at the beginning of the test and moved to 
the following station at three minute intervals up- 
on a signal from the instructor. The responses 


1. Oyril J. Hoyt. 


"Test Reliability Estimated by Analysis of Variance," Psy- 
chometrika, VI (June 1341), pp. 153-160. 
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were recorded in a special booklet. A detailed 
scoring key was made by the authors of the test. 
Each test was graded independently by two grad- 
uate assistants, and their scores averaged. The 
Hoyt reliability was 0. 46 for the pre-test and 0.68 
for the post-test, both significant at the 1 percent 
level. One of the test items is reproduced below. 


Location No. 2 
Given: Open-tube manometer containing 


water connected to a gas jet, meter 
Stick, calipers, table of densities. 


Problem: Determine the gas pressure at the 
jet, in centimeters of mercury. 


When instructor signals, move to Location 
No. 3 


The long-item laboratory practical test was 
devised to measure the ability to solve problems 
by means of apparatus in situations more com- 
plex than those of the short-item test. The 9- 
items of the test were distributed among 6 ‘‘sta- 
tions’’. The students moved from one station to 
the next at 9 minute intervals. The Hoyt relia- 
bility was 0.58 for the pre-test and 0. 68 for the 
post-test, both significant at the 1 percent level. 
One of the test items is reproduced below. 


Location No. 25 


Given: Semi-analytical balance, set of 
Standard weights, damaged stand- 


ard weight. 


Problem: To find the percentage error in the 
value stamped on the damaged stand- 
ard weight. 


When instructor signals, move to Location 
No. 26. 


All of the tests described above were validated 
by prominent college physics teachers. The ma- 
jority of the items had a workable face validity. 
The Davis difficulty and discrimination indices 2 
calculated for all the items on the four post-tests. 
Most of the items on each test had a difficulty in- 
dex in the 25-75 range. 

The reliability of grading the practical tests 
was determined by calculating the product mom- 
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ent correlations and by using the t-test for cor- 
related measures for comparing the means of 
the two corresponding sets of grades.3 The prod- 
uct moment correlations are high for all pairs 

of graders. On the other hand, there was a sig- 
nificant difference in the means in five of the 
comparisons and no differences in the means for 
five other comparisons. The grades on the short- 
item and long-item practical examinations used 
in subsequent analyses were the average grades 
of two graduate assistants. 


4. The Experimental Procedure 


a. The Population. All the subjects of the 
study were students enrolled in Physics la, a 
general, non-technical mechanics course, Fall 
quarter 1950, at the University of Minnesota. Of 
the 194 students registered in the course, 187 
were males and 7 were females. The typical stu- 
dent was a sophomore in the College of Pharm- 
acy, or a sophomore in the College of Science, 
Literature and the Arts, following a pre-dental 
sequence. 

b. Sample Selection and Characteristics. From 
the information filled out by the students, it was 
possible to select 108 students who could be as- 
signed to the 4 experimental and the 4 control 
groups. Every student selected for the study was 
a Minnesota high school graduate, had a home ad- 
dress in the continental USA, had not previously 
taken Physics la or its equivalent, was a male, 
and could take laboratory on Tuesdays. 


The requirement of randomness was satisfied 
by having one control and one experimental group 
meet during the same two-hour interval and as- 
Signing the students within that time block com- 
pletely at random. The usual random number 
technique was used. The random numbers were 
those of Kendall and Smith. 4 Withdrawals from 
the course and incomplete data reduced the orig- 
inal number of selected subjects to 87. The groups 
were unequal and statistical considerations made 
it desirable to have the same number of cases 
in each group. Consequently, a number of cases 
were rejected by the random number technique 
until each group was equal in size to the smallest 
one. This reduced the number of cases to 64 for 
the analyses based on the two written tests, and 
to 56 on the two laboratory :ractical tests. 


c. Selection of the Laboratory Instructors. 
The elementary physics laboratories at the Uni- 


2. Frederick B. Davis. 
sity Press, 1949). 


Item Analysis Data (Cambridge, Mass.: Harvard Univer- 


3. Palmer 0. Johnson. Statistical Methods in Research (New York: Prentice-Hall, 
Ino., 1949), p. 75. 


4, M. G. Kendall and B. B, Smith. Tables of Random Sampling Numbers, Tracts 
for Computers, Wo. 24 (London: Oambridge University oF 1940). 
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versity of Minnesota are taught by graduate stud- 
ents in physics. It was not possible to select 4 
instructors at random from all the available tea- 
ching assistants. The instructors whose sched- 
ules permitted their participation in the study had 
taught elementary laboratories during the prev- 
ious school year and were familiar with the ap- 
paratus, the procedures, and the grading system 
of the physics laboratories at the University of 
Minnesota. Each of the instructors had one lab- 
oratory group in the morning and one in the after- 
noon. Since an experimental and a control group 
were scheduled for each of the 4 laboratory per- 
iods, each pair of instructors tossed a coin to 
determine who would take the control group in 
the morning and the experimental group in the 
afternoon. 

d. The Instructional Procedure. In the exper- 
imental groups the instructors demonstrated the 
experiment of the day. The instructors assem- 
bled the apparatus, made the necessary adjust- 
ments, read the scales and instruments, and 
performed the experiment as indicated in the lab- 
oratory manual. The students observed the tech- 
niques used, examined the apparatus, but were 
not allowed to assemble, manipulate or adjust 
any equipment. The data were recorded on the 
blackboard by the instructor. The students were 
encouraged to participate in the preliminary dis- 
cussion and to ask questions throughout the dem- 
onstration. The students made calculations from 
the data in class, if time permitted, or outside 
of class. A written report on the experiment had 


to be submitted before the next laboratory session. 


In the control groups the conventional method 
prevalent in most of the elementary college phys- 
ics laboratories in the United States was used. 
The students performed the scheduled experi- 
ments in groups of two. The assembly and ad- 
justment of the apparatus were done by the stud- 
ents, and the experimental procedure of the lab- 
oratory manual was closely followed. The data 
were checked by the instructor before the stud- 
ents left the laboratory. A written laboratory 
report was required. 

e. The Control Factors. 
for the eight groups were: 


The control factors 


(1) The three lectures per week were identical 

(2) The textbook, the assignments, and the tests 
were the same. 

(3) An equal number of clock hours were spent 
in the laboratory by all students. 

(4) The same experiments with the same equip- 
ment and laboratory manual were performed. 

(5) One experimental and one control group were 
taught by each of the four instructors. Thus 
the instructor factor was kept constant in each 
replication. 
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(6) All the laboratories were scheduled on Tues- 
day of each week. Whether the experiment 
came before or after the theory had been dis- 
cussed in lecture, the time relationship was 
the same for all groups. 

(7) A short quiz on the experiment of the day was 
given at the beginning of the laboratory per- 
iod. It was thought that the students in the 
experimental groups might otherwise have 
come to the laboratory unprepared. 

(8) The initial status of the students was adjust- 
ed by the analysis of covariance. The groups 
were equated on mental aptitude (ACE), on 
mathematical background, and on their prev- 
ious knowledge of physics. 


5. Analysis of the Experimental Results 


The primary data for the statistical analysis 
of the experimental results consisted of raw 
scores on six initial measures and four achieve- 
ment tests described above. The analysis of var- 
iance and covariance technique was used to test 
the three null hypotheses for each of the four cri- 
terion measures. Each of the analyses involved 
calculating the sums of squares and cross-prod- 
ucts of the test scores, testing the equality of 
within group variances, setting up analysis-of- 
variance table, adjusting the sums of squares for 
the variables to be partialled out, applying the 
F-test, testing the homogeneity of the within re- 
gression coefficients, setting up analysis-of-var- 
iance and covariance tables, and testing the null 
hypotheses by the F-test. 5 

Table I summarizes the complete analysis of 
variance and covariance for the short-item prac- 
tical test. 

By referring to Snedecor’s tables of F the hy- 
pothesis of no difference between the means of 
methods was rejected at the 1 percent level. It 
was concluded that there were significant differ - 
ences between the demonstration and the conven- 
tional methods of teaching elementary college 
physics. The conventional or individual meth- 
od was superior with respect to the outcomes 
measured by the short-item practical test. The 
hypothesis of no difference between the means of 
students under the four instructors were accepted. 
It was concluded that no statistically significant 
differences in the means could be traced to the 
individual instructors. The hypothesis of no in- 
teraction between instructors and methods was 
also accepted. That is, no method worked bet- 
ter for any one instructor than it did for any other 
one, as measured by the means on the short-item 
laboratory final test. 

The three null hypotheses were accepted for 
the mechanics theory, the laboratory written and 
the long-item practical tests. 


5. Jobneon, op. cit., Che. X and XI. 
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Table II summarizes the findings of the inves- 
tigation. 


6. Summary and Conclusions 


The subjects of the investigation were stud- 
ents in the first quarter of the general, non-tech- 
nical physics course at the University of Minne- 
sota. 

A 2 x 4 randomized block design was used to 
test the hypotheses of no difference between in- 
structional methods, no difference between in- 
Structors and no interaction between instructors 
and methods. 

The students were assigned to an experiment- 
al or a control group meeting during the same 
period by the random number technique. 


, In the control groups the conventional method 
was used. Students worked in pairs, from a lab- 
oratory manual, and submitted a written report. 
In the demonstration groups the instructors man- 
ipulated the apparatus and materials of the exper- 
iments. The students observed, copied the data 
obtained by the instructor, made calculations, 
and wrote a report. 


The laboratory instructors were graduate stu- 
dents in the Department of Physics. Each of the 
four instructors had one control and one experi- 
mental group. 


The criterion measures were a theory, a lab- 
oratory written and two laboratory practical tests 
in mechanics. 

The analysis of variance and covariance tech- 
niques were applied to the data. The null hypoth- 
eses were tested by holding constant the raw 
scores on the ACE, the methematics pre-test, 
and the pre-test for each of the four criteria. 


The only hypotheses rejected by the F-test was 
that of no difference between methods on the short- 
item laboratory practical test. 

On the basis of the results of this study it was 
valid to conclude that the conventional method in 
the physics laboratories was more effective than 
the demonstration method for the teaching of in- 
strumental situations, simple measuring tech- 
niques and problems involving apparatus. It was 
also concluded that neither method was superior 
for the more complex laboratory problems. The 
experimental evidence appeared to justify the con- 
clusion that neither method influenced measurably 
the scores on pencil-paper tests based on the ma- 
terial of the lectures and laboratory work. 

No statistically significant differences in the 
means of the four criteria could be attributed to 
the differences between the individual instructors. 
Nor did any one method appear to give better re- 
sults for one instructor than for any other instruc- 
tor. 
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7. Recommendations 


It should be understood that one has to take in- 
to account the limits of the generalizability of ex- 
perimental evidence. There are two aspects to 
these limits: first, population generalizability, 
and second, ‘‘ecological’’ generalizability. Our 
recommendations are related to the population of 
physics students who are not to become physical 
scientists, but need the physics background for 
their major fields of study. Specifically, the pop- 
ulation for which these recommendations holdare 
non-technical students in the College of Science, 
Literature and the Arts and architecture majors 
in the Institute of Technology. The recommen- 
dations will also depend on the range of experi- 
mental conditions which were included into the 
study. Specifically, these were the contrasting 
treatments, as reflected in the demonstrationand 
individual methods, and the four measures of in- 
structional outcomes. 

On the basis of the experimental evidence of 
this investigation it is recommended that the pres- 
ent instructional practices in the elementary phys- 
ics laboratories be modified for certain non-tech- 
nical curricula. If a curriculum requires only 
an ‘‘understanding’’ of the elementary physical 
principles as measured by the paper-and-pencil 
objective tests, used in this study, then a demon- 
stration laboratory is as satisfactory as the con- 
ventional, or individual laboratory. Since neither 
method proved to be superior, it might be desir- 
able to use a combination of the two methods. 


It has not been definitely established that paper - 
and-pencil tests have the same validity as per- 


formance examinations. Consequently, it is rec- 
ommended that an intensive study be made of be- 

havior-sampling performance tests, particularly 
in their relation to paper-and-pencil tests on the 

same subject matter. 


Since the results of the long-item laboratory 
test did no prove to be decisive in favor of either 
method, it is recommended that at least part of 
the laboratory time be spent on the solution of 
practical, instrumental problems. 


A valid recommendation seems to be that if 
more complex experiments are to be conducted 
by the individual laboratory method, then more 
time ‘should be required to achieve the desired 
objectives. It is also conceivable that the dem- 
onstrations of the more complex experiments 
might be advantageous, particularly if the time 
available remains constant. It is recommended 
that the teaching of the more complex experi- 
ments be put in the hands of instructors espec- 
ially selected for their demonstration ability. 
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8. Implication for Further Research 


There are many aspects of science instruc- 
tion that were not included within the scope of 
this study. For instance, no attempt was made 
to evaluate directly the ability to use the ‘‘scien- 
tific method’’. 

Following are typical problems that were left 
unanswered by this study: 


For which individual laboratory experiments 
is the demonstration method superior ? 

Would the findings of this investigation be the 
same for a population of technical students? 

Could the difference between the instructors 
become more pronounced through special train- 
ing in demonstration techniques ? 

How can the reliability and the validity of lab- 
oratory performance tests be raised? 

Would the problem-solving method be super- 
ior to either the conventional or demonstration 
methods ? 


(Vol. XX 


The writer wishes to express his gratitude to 
Professor Falmer O. Johnson, for his invaluable 
help during the course of this study. His keen 
interest, wise judgment, stimulating discussion 
and generous grants of time are thankfully ack- 
nowledged. The writer is especially indebted to 
Dean J. W. Buchta and Professor C. N. Wallfor 
the opportunity given the writer to teach and car- 
ry out the investigation in the Physics Depart- 
ment under most favorable conditions. Profes- 
sors Robert J. Keller and Cyril Hoyt were con- 
sulted often and with profit. Mssrs. William 
Moonan and Clayton Stunkard were of great help 
with the statistical treatment of the data. Thanks 
are due to Dr. Lynne E. Trainor for scheduling 
and supervising the practical examinations and 
for contributing many of the test items. Mssrs. 
K. Anderson, R. Lagergren, T. Stratton, N. 
Horwitz, and M. Kettner, graduate assistants 
in the Fhysics Department, offered excellent co- 
operation as participating instructors. 


AN APPLICATION OF SOCIOMETRIC TECH- 
NIQUES TO SCHOOL PERSONNEL 


J. WAYNE WRIGHTSTONE 
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GEORGE FORLANO 
PAUL GASTWIRTH 
Board of Education, City of New York 


Purpose 


THE UTILIZATION of sociometric tech- 
niques is increasing among educators in assess- 
ing and analyzing social relationships among el- 
ementary and high school pupils. The use of 
these techniques among teacher groups is begin- 
ning to receive the attention of research workers. 
The purpose of this study is to apply sociometric 
techniques for the evaluation of a proposed meth- 
od of improving the intra-staff acceptability of 
teacher isolates. 

Jennings | in Sociometry in Group Relations, 
indicated that there were three factors which pro- 
moted social development in the classroom to a 
significant extent as reflected in sociometric 
structure. These factors were the warmth of the 
teacher, activities which permit a high degree of 
interaction, and the use of democratic methods. 
It may be that to some extent the same three fac- 
tors, among others, are operative in the sphere 
of intra-staff relationships in a school. In study- 
ing the sociometric structure of a school faculty, 
we may consider that the principal plays a role 
analogous to that of the classroom teacher in Jen- 
nings’ experiments. 


Role of the Principal in the Socializing Program 


The major purpose of this experiment was to 
evaluate a certain supervisory program for im- 
proving intra-staff relationships in a school. The 
method consisted of involving teacher isolates in 
many diverse socializing activities in a natural 
setting. In addition to providing abundant oppor- 
tunities for interpersonal relationships through 
discussion groups, conferences, and multi-teach- 
er staffed projects for the social development of 
the least popular teachers, the principal also en- 
riched or amplified the regular school activities 
that were currently being carried on by the ex- 
perimental teacher isolates. 

A major adjunctive aspect of the socialization 
program, moreover, was the provision for the 
close association of the experimental isolates with 
the most popularteachers. This aim was 


achieved by placing teacher isolates together with 
the most popular teachers on a special commit- 
tee. 

rticipating School Personnel 

Seven representative elementary and junior 
high schools were selected by the assistant sup- 
erintendent in charge of the school district. Each 
school had a regularly licensed principal, and 
all but one had an assistant to the principal. The 
staffs of the seven schools ranged in size from 
35 to 44 teachers in October 1948, and from 32 
to 41 teachers in June 1949. The total number 
of teachers involved in this project was 302 on 
the former date and 293 at the end of the school 
year. Substitute teachers were not included in 
the analysis of the sociometric data because of 
the provisional nature of their positions. Each 
principal supervised the development of this pro- 
ject in his own school. The entire experiment 
was coordinated by one selected principal and 
one selected research worker, under the general 
supervision of the Assistant Superintendent in the 
field and the Director of the Bureau of Education- 
al Research. 


Sociometric Determination of Isolates and Stars 


While the major aim of this study was to eval- 
uate a technique for improving the quality of hum- 
an relationships within a school staff, the first 
step was to devise a technique for analyzing the 
sociometric structure of a school faculty and dis- 
covering teacher isolates and stars. 

In view of the maturity and sophistication of 
the subjects, the commonly used sociometric 
methods of group analysis were not feasible. It 
was decided, therefore, to try a tangential ap- 
proach which would mask the real nature of the 
inquiry. Accordingly, a real honest-to-goodness 
educational project, inherently valuable, was 
launched, namely, the identification of obstacles 
to effective teaching. A committee was designat- 
ed in each of the seven schools to make a study 
of these obstacles and to send a report to the As- 


1. Helen H. Jennings. 


Soolometry in Group Relations (Washington, D.C.: Amer- 
itoan Council on Education, 
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sistant Superintendent. These school reports 
were duly submitted, summarized by a commit- 
tee of supervisors, and presented to some sixty 
supervisors at a conference in June 1949. 

The culmination of this conference was tne co- 
operative formulation by the conferees of a ser- 
ies of suggestions as to now to eliminate those 
factors which were considered by tne teachers 
themselves as their greatest obstacles to good 
teaching. 

To initiate this project, the teachers ineach 
school were asked to elect an Obstacles Committee 
of seven members to make a survey of the obstacles 
to good teaching. The teachers indicated their 
choices by listing seven names, inorder of prefer - 
ence, onasecretballot. The choices of the teachers 
were tabulated and on the basis of ananalysis of these 
data, the teachers in each school were ranked in ord- 


er of popularity or degree of intra-staff acceptability. 


As a rough check of the validity of this meth- 
od of ascertaining the isolates and stars ina 
school staff, the principal of one school record- 
ed, before the tabulation of teacher choices, the 
names of seven teachers who, in his opinion, 
would be voted most acceptable by the staff, and 
several who would be judged as least acceptable. 
After the tabulation of the teachers’ votes had 
been completed, it was seen that the principal 
had correctly predicted six of the first seven stars, 
and four of the lowest six isolates. In subsequent 
informal discussions with tne other principals, an 
approximately equal predictive ability on their 
part was reported. The close correspondence be- 
tween the teachers’ selection of stars and isolates 
and the principals’ judgments based on an intimate 
knowledge of their staff members, was taken asa 
rough indication of the validity of the committee- 
election technique in yielding sociometric data. 


The Composition and Role of the Obstacles 
Committee 


In each school, the Obstacles Committee was 
composed of seven members. It included the high- 
est four teachers in the popularity ranking based 
on the teachers’ votes and three selected from 
among the least popular six teachers. The six 
teachers who were judged by their peers asleast . 
acceptable, were divided into two equated groups 
of three each. The ‘‘experimental group’’ was 
included in the Obstacles Committee; the ‘‘con- 
trol group’’ was not provided by the experiment- 
ers with any special assignments or activities. 

The function of a school’s Obstacles Commit- 
tee was to conduct a survey of those handicaps 
and limitations in the school that the faculty con- 
sidered to be definite blocks to better teaching. 
The Obstacles Committee met regularly at least 
twice a month to make plans for its survey and 
to discuss its findings. It was hoped that the Ob- 
stacles Committee would provide a major source 
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of socializing experiences for isolates by provid- 
ing them, first of all, with regular and continued 
association witn the most popular teachers, and 
second, with frequent contacts with other staff 
personnel. 

The survey plans made by the local commit- 
tees were submitted to the central office and the 
experimenters then compiled as master list of 
suggested patterns of research. This list was 
returned to each school committe to guide it in 
the organization of its inquiry and to direct it 
toward the use of social techniques of research 
such as group discussions, department and grade 
conferences, and personal interviews. The pur- 


pose of this step was to increase the amount and 

quality of the socializing activities of the teacher 

isolates on the various Obstacles Committees by 

placing them in situations inviting face-to-face 

relationships with their colleagues on tne staff. 


Experimental Design 


The evaluative procedure that was utilized to 
measure the effect of the experimental factor, 
that is, tne supervisor’s socialization program, 
consisted of an analysis of the sociometric data 
of two groups of teachers equated as to popular- 


| ity, one of which was exposed to the experiences 


incorporated in the socialization program while 
the other group followed no specially designed 
program of activities. 

After the experimental period, the final socio- 
metric ratings of the teachers in the experiment- 
al group were compared with the final sociomet- 
ric rating of the teachers in the control group. 
The next step in the evaluative process was to ob- 
tain the amount of improvement or gain, as meas- 
ured by sociometric ratings, of both the experi- 
mental and control groups. If the gain or im- 
provement of the experimental group was greater 
than that for the control group the difference in 
favor of the experimental group might be inter- 
preted to mean that the socializing program had 
discernable beneficial effects. 


Method of Evaluating the Socialization Program 


How effective was the socialization program 
in improving the popularity or intra-staff accept- 
ability of teacher isolates? Acceptability was 
measured in terms of each teacher’s position or 
rank as determined by the tabulation of the staff’s 
preferential balloting for the members of the Ob- 
stacles Comm:ttee. To measure the effects of 
the socialization program, a similar election had 
to be held in June 1949. For this purpose, an- 
other genuinely worthwhile educational project 
was submitted to the teachers, namely, the elec- 
tion of a Teachers’ Council. In those schools 
which already had an active Teachers’ Council, 

a committee was elected to evaluate the work of 
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the Teachers’ Council with a view to improving 
its functioning. 

To measure the effectiveness of the socializa- 
tion program a comparison was made between 
the initial positional ratings of the lowest six 
teachers in October 1948 and their final position- 
al ratings in June 1949. The time that elapsed 
between the initial and final rating periods amount- 
ed to about eight months. Inasmuch as it took a 
few weeks to set up the Obstacles Committee in 
each school, it would be safe to say that the dur- 
at.on of the experimental socialization program 
was approximately seven months. 

In those cases in which the final popularity 
rank was higher than the initial rank, this fact 
was interpreted as a beneficial effect of the soc- 
ialization program. When the final popularity 
rank was equal to or less than the initial standing, 
it was felt that this evidence indicated no positive 
effect of the socialization program. 

The positional rating which indicated the pop- 
ularity of each teacher was transmuted into an 
intra-staff acceptability score by utilizing Hull’s2 
formula for the transmutation of rank orders of 
merit into units of amount or scores. This trans- 
mutation was undertaken because it was felt that 
acceptability as a trait would most likely be dis- 
tributed normally among individuals. The assum- 
ption of normality in a trait implies that differ- 
ences at the extremes of the trait are relatively 
much greater than differences around the mean. 
Thus, while all differences in an order of merit 
series are equal to one, the differences among 
the transmuted scores varied considerably. 

The application of the formula achieved still 
another purpose. Since the number of teachers 
on a school’s staff varied from September 1948 to 
June 1949, as well as from school to school, some: 
method had to be employed to equate for the dif- 
ferences in staff membership. Thus, a sociomet- 
ric rank of six in a school’s staff of 42 teachers 
would not be equivalent to a rank of six in aschool 
with a faculty register of 33. 

The application of the Hull formula served to 
adjust the inequalities among the schools due to 
differences in the number of teachers on a school’s 
Staff. 

The mechanics of calculating the sociometric 
transmuted scores was simple. The teacher with 
the fewest number of votes was assigned a pos- 
itional rating of one, the next in line was number 
two, and so on in ascending order. Comparative 
data was compiled only for the six isolates. The 
change in sociometric rank of each of the six tea- 
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cher isolates was noted by subtracting the trans- 
muted score expressing the teachers’ positional 

rating in October 1948 from the score indicating 
his rank order in June 1949. The numerical dif- 
ference between the initial and final scores rep- 

resented the gain or loss in acceptability. 


Results and Conclusions 


The sociometric change for each of the six 
teacher isolates in each of the seven schools was 
obtained. In all there were forty-two teachers; 
twenty-one experimental isolates and twenty-one 
control isolates. However, because of absences, 
transfers and the like, there remained but sixteen 
teachers in each of the two groups. 

The mean gains in acceptability scores of the 
experimental and control isolates are presented 
in Table I. In addition, the table summarizes the 
results of examining the difference in mean gain 
between the two groups for statistical significance. 
In computing the standard error of the means the 
formula for small samples was employed. 3 

The obtained difference in mean gain accept- 
ability score between the two groups is 8. 18 and 
is in favor of the experimental isolates. The mean 
gain difference approaches but does not reach the 
required confidence level of at least .05. It may 
be said, however, that as compared to the mean 
gain of the control group, the mean gain improve- 
ment of the experimental group was about eighty 
percent greater. 

The chi-square analysis of the sociometric 
data, and the calculation of the significance of a 
difference in means of related measures of paired 
groups, gave results that were practically ident- 
ical with those obtained in the analysis presented 
in Table I. 

Tosummarize, although there was an increase 
in the intra-staff acceptability score of the con- 
trol isolates, the gain in acceptability score of 
the experimental isolates, while it only approach- 
ed statistical significance, nevertheless was about 
eighty percent greater than that for the control 
group. 


Special Factors 


At the end of the experimental period, the 
principals of the seven schools involved submit- 
ted a brief description of all of the ‘‘extra-curric~ 
ular’’ activities in which both their experimental 
and control teachers had participated. An anal- 


| ysis of these lists of activities disclosed an inter- 


2. Cc. L. Hull. 


3. B. F. Lindquist. Statistioal Analysis in 
Houghton Mifflin Oo., » PP. ° 


"The Computation of Pearson's r from Ranked Data," Journal of 
Applied Psychology, VI (1922), pp. 385-390. a 
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TABLE I 


MEAN GAINS IN ACCEPTABILITY SCORES OF EXPERIMENTAL AND 
CONTROL ISOLATES IN SEVEN PARTICIPATING SCHOOLS 


Acceptability 
Scores 


Experimental 


Control 


8.18 4. 92 1.67 


esting phenomenon which tended to blunt the full 
effect of the experimental factor. Many of the 
control teachers had as many, if not more, soci- 
alizing activities, than the experimental teachers. 


Although the principals had taken pains to provide . 


the experimental teachers with as many social- 
izing experiences as possible, the exigencies of 
school administration often demanded the partic- 
ipation of the control teachers in similarly soc- 
ializing activities. In an ideal experimental sit- 
uation the activities of the control group teachers 
would have been rigorously limited. Even in such 
a case, however, there would exist no practicable 
way of limiting the out-of-school intra-staff con- 
tacts of the control group. For example, a con- 
trol teacher might acquire a car and thereby at- 
tract a group of teachers eager for a daily lift 
to and from school. 

It is important to observe, therefore, in car- 
rying out the design of the experiment, that the 
distinction between the experimental and control 


groups was often blurred because certain control 
teachers were inadvertently and unavoidably in- 
volved in socializing activities fairly comparable 
to those planned by the principal for the experi- 
mental teachers. Further, some of the experi- 
mental teachers could not carry a full load of so- 
cializing activities because of ill health, after- 
noon or evening jobs, difficult teaching assign- 
ments, and other reasons. 

It may be reasonable to assume that if the 32 
isolates involved in this project had been re-divid- 
ed retroactively into an experimental group and 
a control group on the basis of the actual activi- 
ties engaged in, results much more favorable to 
the purposes of this experiment might have been 
obtained. Such an undertaking was not practic- 
able at this time, however, because it would have 
required the prior establishment of a rating scale 
to evaluate the socializing effect of many specific 
school activities and a prearranged uniform sys- 
tem of activity records. 
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CURRENT ORGANIZATION AND PROCEDURES 
IN REMEDIAL TEACHING’ 


ARTHUR E. TRAXLER** 
Educational Records Bureau 
New York City 


IT IS KNOWN ina general way that during 
the last quarter of a century numerous schools 
and colleges have developed programs of remed- 
ial teaching, especially with reference to reading. 
In attempts to obtain information on the extent of 
remedial work undertaken and on the organiza- 
tion and procedures used, various individuals and 
groups have in recent years sent out question- 
naires and have reported summaries of the re- 
plies. Space will permit specific mention of only 
a limited number of these questionnaire studies. 

In the early 1940’s, surveys of reading instruc- 
tion were reported by the Research Division of 
the National Education Association (9)1, by Blair 
(2), and by Richter and Parr (10). About the 
same time, studies of practices in remedial and 
corrective reading in college were made by Witty 
(13), Charters (5), and Triggs (11). In 1945, a 
survey was made of provisions for reading in- 
struction in secondary schools and colleges hold- 
ing membership in the Educational Records Bur- 
eau (12). F. R. Lindquist (8) surveyed the read- 
ing instruction provided college students in the 
academic year 1947-48. In 1949, Brink and Witty 
(4) reported on practices in remedial reading in 
secondary schools. In 1950, Boyd and Schwier- 
ing (3) reported a survey of child guidance and 
remedial reading practices in seventy-six clinics, 
sixty of which were associated with colleges and 
universities. A survey of remedial reading ser- 
vices with special attention to clinical services 
of nineteen public schools was reported by Jack- 
son (7) in 1951. Also in 1951, Barbe (1) gave a 
summary of replies of sixty-seven institutions to 
a postcard questionnaire which had been sent to 
ninety-five major colleges and universities through- 
out the United States. The Committee on Diag- 
nostic Reading Tests sent a questionnaire on read- 
ing instruction to schools and colleges throughout 
the United States in the spring of 1951, and aser- 
ies of articles summarizing the replies is being 
prepared (6). The reports of these studies show 


clearly that there is wide-spread awareness of 
serious reading difficulties among students and 
that there is a need for vigorous programs of 
reading extending from the primary grades through 
the senior college. They also show that numerous 
schools and colleges have within the last few years 
undertaken some form of planned, organized act- 
ivity designed to improve reading ability. The re- 
ports indicate, however, that while a good begin- 
ning has been made, there are very few reading 
improvement programs which reach all pupils; 
that there is a good deal of confusion over both ob- 
jectives and procedures; and that limited budgets, 
dearth of trained personnel, and difficulty of en- 
listing full cooperation of staff members are ser- 
ious limitations. 

The National Association of Remedial Teach- 
ers has a favorable opportunity and a special re- 
sponsibility to provide leadership in the further 
development of programs of remedial teaching, 
not only with regard to reading but for all basic 
skills. A first step in meeting this opportunity 
and this responsibility would seem to be to find 
out what the current situation is with regard to 
remedial teaching in our own institutions. 

In the winter of 1951, President Williams S. 
Gray appointed a committee to survey the organ- 
ization and procedures of remedial teaching in 
institutions represented in the membership of the 
NART. The present members of the committee 
are Mrs. E. P. Gaillard, Bronxville Public 
School, Bronxville, New York; Mr. Warren 
Koehler, Milton Academy; Dr. Agatha Townsend, 
Educational Records Bureau; and the writer. Af- 
ter considerable discussion and planning, the com- 
mittee drafted a four-page questionnaire on rem- 
edial practices and sent it to about 750 persons. 
Nearly all these persons were members of the 
NART, although the questionnaire was sent to a 
few non-members connected with institutions that 
were believed to be carrying on useful experi- 
mentation with remedial teaching procedures. 


A report given at the annual meeting of the National Association of Re- 


medial Teachers, Hunter College, New York City, November 3, 1951. 


**Chairman, Research Committee on Organization and Procedures in Remedial Teach- 
ing of National Association of Remedial Teachers. 


1. The numbers in parentheses refer to numbered items in the bibliography 
at the end of the paper. 
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Two hundred and seventeen usable replies were 
received. In some instances, where several staff 
members of an institution belonged to the NART, 
the questionnaire was filled out either as a group 
project or by one person representing the group. 
Thirty other persons replied that they were en- 
gaged in work to which the questionnaire did not 
apply. It is believed that more than half the ed- 
ucational institutions—schools, colleges, and 
clinics—represented in the NART membership 
are included in the study. 

The educational institutions from which re- 
plies were received include ninety-five public 
schools and public school systems, sixty indepen- 
dent schools, twenty-six colleges and thirty-six 
clinics. Separate tabulations were made for these 
classes of institutions, as well as for the total 
group. In considering the results of the question- 
naire it should be kept in mind that, since this is 
primarily a study within the NART membership, 
the schools, colleges, and clinics included inthe 
study may not be a representative sample of those 
throughout the United States. It should also be 
said that there are minor inconsistencies in por- 
tions of the data because of the fact that not all 
institutions were consistent in their replies. For 
example, in connection with certain questions con- 
sisting of a main question and several subordin- 
ate questions, an occasional institution answered 
the main question negatively and then proceeded 
to answer the subordinate questions as though an 
affirmative response had been given to the first 
question. It is believed, however, that the num- 
ber of these inconsistencies is not large enough 
to affect the main conclusions of the study. 

It is not the purpose of this paper to report in 
detail the replies to all the questions. The de- 
tailed statistical data will be made available sep- 
arately.2 The present statement will consist of 
a summary and interpretation of those findings 
with regard to the status and trends of remedial 
teaching that are believed to be the more import- 
ant for the assistance of persons working in this 
field. 

An attempt will be made to state certain basic 
principles, or generalizations, which it is be- 
lieved would find wide acceptance among special- 
ists in remedial and corrective teaching and to 
consider the replies to the questionnaire in rela- 
tion to these principles. 


Developmental Reading 
Principle 1: The first basic principle is that de- 


velopmental reading is theoretically one of the 
most important areas in a school’s reading 
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program; each school should attempt to im- 
prove the reading of all pupils through the 
regular procedures of the school. 


Approximately two-thirds of the public schools 
and half of the independent schools say that they 
do try to improve the reading of all pupils through 
their regular procedures. These data indicate 
that considerable progress has been made in the 
introduction of developmental reading programs 
into public and independent schools but that much 
remains to be done. If these figures were based 
upon replies of secondary schools alone, they 
would seem definitely encouraging, but the per- 
centages are inflated by the fact that both the pub- 
lic and independent school groups included a con- 
siderable number of schools at the elementary 
level where reading instruction of all pupils has 
long been established. 

It is not surprising to find that less than a fifth 
of the colleges attempt to improve the reading of 
all their students. In fact, it seems somewhat 
surprising that as many as five out of the twenty- 
six colleges say that they do make provision of 
this kind. 

The methods most frequently mentioned by the 
elementary schools in providing for the reading 
needs of all pupils are systematic instruction in 
a regular reading period in the classroom, indi- 
vidual and small group work in the classroom de- 
signed to take account of individual differences, 
and well-planned, free reading opportunities for 
recreation and pursuit of developing interests. 
The secondary schools most frequently carry on 
their developmental reading work by means of a 
definite program of reading instruction as an es- 
sential part of the work of the English department 
and incidental guidance in reading given in the 
regular subject courses. Other methods freq- 
uently checked, especially by the independent 
schools, were incidental instruction or guidance 
in reading in study hall or supervised study per- 
iods and well-planned free reading opportunities 
for recreation and the pursuit of the pupils’ spec- 
ial interests. 


Principle 2: At the elementary school level, the 
developmental program should be preceded by 
a reading readiness program. 


It is noteworthy that nearly nine-tenths of the 
public elementary schools replying say that they 
do have a readiness program preceding formal 
reading instruction. Only a little more than half 
the independent schools replied affirmatively to 
this question, but a considerable number of the 


2. A number of typewritten copies of the statistioal tables are available on 


@ loan basis. 


A copy may be borrowed for a period of one month by writing 


to the committee, care of the Educational Records Bureau, 21 Audubon Ave- 


nue, New York 338, W. Y. 
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independent schools had no elementary school 
grades and thus the question was not applicable 
to them. 

It was also brought out by the questionnaire 
that most first grade pupils are tested with read- 
ing readiness tests and that in the majority of the 
schools instruction is differentiated for the pupils 
after such testing. 


Corrective Reading 


Principle 3: Many instances of mild reading dis- 
ability can be cared for by means of a group 
corrective program; it is advisable for each 
school to carry on a program of this kind. 


More than three-fourtas of the 217 institutions 
Say that they do have a program for the correc- 
tion of mild cases of reading difficulty. Tne in- 
Stitutions responding affirmatively include nalf 
the clinics and more than three-fourths of the 
schools and colleges. It is interesting to find that 
the percentage of the colleges answering ‘‘yes’’ 
to the question is a little higher than that for the 
public and independent schools. It is surprising, 
and a bit disconcerting, to find that more than 
half the schools say that only 1 to 10 percent of 
their pupils receive group corrective instruction. 
Reading tests indicate that in most schools a much 
larger proportion of pupils need corrective atten- 
tion. 

Although the first experiments in group cor- 
rective reading were carried on about a genera- 
tion ago, organized programs of corrective read- 
ing are a new development so far as most schools 
cooperating in this study are concerned. This is 
true even among the institutions represented in 
the NART membership. Seventy percent of the 
institutions carrying on a corrective program 
Stated that the program was begun during the dec- 
ade, 1941-50, as compared with 14 percent for 
the preceding decade and only 3 percent for the 
period up to 1930. The rapid development of cor- 
rective reading programs during the last ten years 
is one of the most striking facts brought out by 
the questionnaire. 


Principle 4: So far as possible, classes in cor- 
rective reading should be scheduled in the reg- 
ular school day in order to give this work status 
with the pupils and to provide for frequent 
meetings of corrective groups. 


About four-fifths of the institutions indicating 
that they have a program of this kind say that they 
have organized classes in the regular school day. 
As would naturally be expected, the percentage 
of the colleges having this kind of organization 
for corrective work is somewhat smaller than 
that for the schools and the clinics, but even in 
the colleges 65 percent of the institutions have 


TRAXLER 307 


regular classes of this kind. A little more than 
one-third of the schools and colleges and about 
one-fifth of the clinics say that tney have special 
groups meeting at free periods. A few institu- 
tions apparently use both kinds of organizations, 
but classes in the regular school day clearly are 
in the majority. 

It is not a common practice to grant regular 
course credit for corrective reading. Only about 
one institution is six indicates that credit of this 
kind is given. 


Remedial Reading 


Principle 5: Some cases of reading disability 
require intensive remedial instruction, and 
provision should be made for such instruction 
on an individual basis. 


The replies to the questionnaire show clearly 
that the great majority of the schools represent- 
ed in the NART membership now have provision 
for remedial reading for markedly retarded pu- 
pils. Approximately nine-tenths of the independ- 
ent schools and the clinics and about three-fourtns 
of the public schools and colleges say that they 
give individual remedial instruction. About 60 
percent of all these institutions have begun their 
remedial programs since 1940. 

Inquiry was made into the proportion of the 
total student group served by the remedial pro- 
gram in public and independent elementary and 
secondary schools. In general, in both the pub- 
lic schools and the independent schools, approx- 
imately 80 percent of the schools with element- 
ary grades replying to the questionnaire indicated 
that not more than 10 percent of their pupils were 
included in the remedial reading program. For 
a few schools, the number of pupils in the re- 
medial reading program ran as low as 1 percent 
of the total enrollment. At the other extreme, 
two public schools and one independent school in- 
dicated that 100 percent of their pupils were in- 
cluded, but it is believed that they must have been 
thinking of the developmental not the remedial 
program. 

More than 75 percent of the public high schools 
and over 85 percent of the independent secondary 
schools indicated that less than 10 percent of their 
pupils were given remedial reading. No school 
indicated that more than 30 percent of its pupils 
were included in the remedial program. 


It is clear that in the great majority of the 
schools only a small proportion of the pupils re- 
ceive individual remedial training. This is in 
accordance with what one would expect, for in- 
dividual remedial work is very time-consuming 
and ordinarily trained personnel is now available 
to handle only a small number of the most retard- 
ed cases. 
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Measurement and Evaluation 


Principle 6: A program of reading improvement 
begins with evaluation; the most useful single 
technique of evaluation is the standard test. 


Nearly four-fifths of the 217 institutions reply- 
ing to tne questionnaire say that they administer 
Standardized reading tests regularly to all pupils. 
The percentages vary to some extent according 
to type of institution. Among the independent 
scnools, 87 percent say that they make regular 
use of reading tests, while only 58 percent of the 
colleges indicate that they do so. 

The reading tests mentioned most frequently 
are the reading parts of the Stanford Achievement 
Test, the reading parts of the Metropolitan Ach- 
ievement Tests, the Gates Reading Tests, the 
Iowa Silent Reading Tests, and the Cooperative 
Reading Comprehension Test. Other reading 
tests used with considerable frequency are the . 
Survey Section of the Diagnostic Reading Tests, 
the Durrell-Sullivan Reading Capacity and Achieve- 
ment Tests, the Gray Oral Reading Tests, andthe 
Iowa Tests of Basic Skills, the Nelson-Denny 
Reading Test, the Progressive Reading Tests, 
and the Traxler Silent Reading Test. 

With regard to time of year at which reading 
tests are given, the more common practice in 
public schools is to give them near the end of the 
year, while in independent schools and colleges 
they are more frequently administered near the 
beginning of the term. The latter procedure would 
seem preferable since it allows ample time for 
study of the results and the planning of teaching 
procedures based in part upon the strengths and 
weaknesses indicated by the test scores. 

More than half the independent schools admin- 
ister reading tests to their pupils twice a year, 
and nearly all the others give them annually. Ap- 
proximately half the public schools give reading 
tests annually, and more than one-fourth of these 
schools administer them twice a year. Seldom 
is the testing in public schools and in independ- 
ent schools less frequent than annually. Half 
the colleges say they administer the reading tests 
at irregular times. 


Principle 7: The potential of an individual’s read- 
ing ability is determined by his verbal intelli- 
gence. It follows that tests of mental ability 
should be used in conjunction with reading tests. 


More than four-fifths of the institutions reply- 
ing to the questionnaire say that they do use the 
results of mental ability tests in evaluating the 
results of reading tests. Thirty-four of the thirty- 
six clinics answered this question affirmatively, 


and the other two omitted it. It is apparent from 
the replies that nearly all the institutions cooper- 
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ating in this study use tests of intelligence along 
with reading tests. 

It is significant that in response to a question 
concerning what tests of mental ability are used 
the Stanford-Binet Scale and the Wechsler-Belle- 
vue Scale are mentioned more frequently thanany 
other tests. Both of these tests are administer- 
ed individually; reading ability has less influence 
on the results of these tests than is the case in 
most group tests. Ina program of reading im- 
provement, it is obviously desirable to use intel- 
ligence tests that do not depend upon reading abil- 
ity. 

Most group tests of mental ability call for a 
considerable amount of reading. Those that pro- 
vide for separate measurement of verbal ability 
and other aspects of mental ability have greater 
diagnostic value than the tests yielding one over- 
all score. The group tests mentioned most fre- 
quently in the replies to the questionnaire were 
the American Council Psychological Examination, 
the California Test of Mental Maturity, the Kuhl- 
mann-Anderson Intelligence Test, and the Otis 
Self-Administering Test of Mental Ability. The 
first two tests are useful in the diagnosis of read- 
ing ability. The ACE psychological examination 
provides separate linguistic and quantitative 
scores, and the California Test of Mental Matur- 
ity yields separate IQ’s for language and non- 
language factors. The Kuhlmann-Anderson Intel- 
ligence Tests yield an overall mental age andIQ, 
but mental ages for separate parts are also avail- 
able, and if these are studied considerable infor- 
mation of a diagnostic character may be obtained. 
The Otis test provides a single mental age and 
an IQ which depend heavily upon reading ability; 
thus it is difficult to say whether low scores on 
this test indicate limited mental ability or lack 
of reading skill. 


Personnel Used in the Program 


Principle 8: While a program of reading improve- 
ment should utilize the regular procedures and 
the regular classroom teachers of a school, 
the success of such a program calls for the 
leadership of one or more specially trained 
persons. 


Nearly two-thirds of the public schools and 
independent schools and one-half of the colleges 
responding to the questionnaire have one or more 
teachers giving full time to remedial instruction. 
These proportions are somewhat larger than had 
been anticijated. They probably result from the 
somewhat selected character of the institutions 
represented in the NART. It seems unlikely that 
so large a proportion of the schools and colleges 
throughout the country are staffed with full-time 
remedial teachers. 
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In addition, approximately one-third of the 
public schools, half the independent schools, and 
two-thirds of the colleges have teachers to whom 
part-time remedial instruction is assigned. 


Principle 9: In the remediation of reading disa- 
bilities, specialists in other areas frequently 
are needed; efforts should be made to have 
the services of such specialists available for 
the reading program. 


Specialists, such as psychologists, psychia- 
trists, eye specialists, and hearing specialists, 
are available to the majority of the schools with- 
in the school or within the community. However, 
none of the specialists is to be found on the staff 
of as many as half the schools. In about a third 
of the institutions replying, the staff includes a 
school psychologist. None of the other special- 
ists is employed on a regular basis by as many 
as a fifth of all the institutions, although nearly 
half the clinics carry clinical psychologists on 
their staffs. 

An item in the questionnaire also dealt with the 
training of teachers of reading, but since thisas- 
pect is being studied much more thoroughly by 
the NART Research Committee on Qualification 
for Remedial Teaching under the chairmanship 
of Dr. Helen M. Robinson, it is not included in 
this summary report. 


Contributing Factors 


Principle 10: A variety of factors may contrib- 
ute to reading difficulty; students referred for 
special help in reading should be examined in 
these areas. 


More than half the institutions replying to the 
questionnaire indicated that when students are re- 
ferred for special help they check these students 
for the following possible contributing factors: 
handedness, eyedness, visual acuity, auditory 
acuity, speech defects, visual abnormalities, 
limited vocabulary, spelling disability, and per- 
sonality difficulty. Among these factors, vocab- 
ulary, visual acuity, and auditory acuity were 
mentioned by more than two-thirds of the institu- 
tions. The colleges lagged definitely behind the 
schools and clinics in checking on these contrib- 
uting factors. 


Principle 11: It has been observed that reading 
disability is often associated with personality 
difficulty, but causation cannot be assumed 
from such observation; each case must be stud- 
ied in order to determine whether personality 
deviation is a factor in an individual’s reading 
difficulty. 


One of the questions in the survey read as fol- 
lows: ‘‘In what percentage of students referred 
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is personality deviation a major factor in the read- 
ing problems?’’ 

Those returning the questionnaire took a cau- 
tious attitude toward this question. Nearly half 
of them omitted it. Of those answering, approx- 
imately one school in six said that personality de- 
viation was thought to be a factor in reading dis- 
ability in less than 10 percent of the cases. One 
school in eight indicated 20 to 40 percent of the 
cases. One school in four said 50 to 90 percent. 
One institution in twenty thought that personality 
difficulty was a causative factor in 100 percent of 
the cases. It is difficult to draw any conclusion 
from the replies except the obvious one that there 
is a great deal of uncertainty and much difference 
of opinion concerning the role that personality 
maladjustment plays in reading disability. This 
continues to be a fertile field for research, not- 
withstanding the fact that a number of studies have 
been reported in this area in recent years. 


Equipment Used 


Principle 12: Mechanical equipment designed for 
diagnosis and instruction may frequently be 
very useful, but many schools and colleges 
seem able to carry on without such equipment. 


The number of institutions using the various 
kinds of mechanical equipment was not large. 
Among the diagnostic instruments, the telebin- 
ocular, the audiometer, and the voice recorder 
were used by approximately one-third of the in- 
stitutions. Very few of the public schools and 
independent schools were equipped with the oph- 
thalmograph, but about one-fourth of the colleges 
and clinics used this instrument. 

Among the instruments designed for instruc- 
tional use, the tachistoscope was employed by 
about two-fifths of the institutions, the reading 
rate accelerator and film strips by about one- 
third, and the metronoscope by less than one in 
ten. In general, these mechanical devices were 
used more extensively in colleges and clinics 
than in public and independent schools, althougha 
larger percentage of public schools used film strips. 


Teaching Procedures 


Principle 13: A wide variety of teaching proced- 
ures should be used with pupils retarded in 
reading; procedures should be chosen with re- 
gard to their appropriateness for the difficulty 
as indicated by the diagnosis. 


The replies to the questionnaire show that ' 
many different approaches to instruction in rem- 


edial reading are used. The questionnaire listed 
thirteen kinds of teaching procedures, and all of 
these were checked by sizable proportions of the 
respondents. In addition, nearly fifty other tea- 
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ching procedures were listed. 

Among the teaching procedures listed in the 
questionnaire, those checked most frequently in- 
cluded ‘‘instruction in finding main ideas and sup- 
porting details, ’’ ‘‘drill on enlarging the sight 
vocabulary, ’’ ‘‘instruction in reading directions, ’’ 
“‘instruction in oral reading, ’’ ‘‘instruction in 
skimming, ’’ and ‘‘study of affixes and roots. ”’ 
The most popular teaching procedure seemed to 
be the common, everyday one of drilling on en- 
larging the sight vocabulary. 


Kinds of Materials Employed 


Principle 14: Flexibility should be maintained in 
obtaining materials for remedial and correc- 
tive work; both materials published for remed- 
ial work and materials specially selected or 
prepared by the teachers should be utilized 
according to need. 


Approximately two-thirds of the cooperating 
institutions indicate that they use published books, 
periodicals, workbooks, and textbooks, and also 
prepare or select their own materials. Less than 
one institution in five depends upon published ma- 
terials alone, and less than one in ten depends 
solely upon the preparation of its own materials. 

Within the last twenty years, numerous text- 
books and workbooks designed for use in correc- 
tive and remedial reading have been published. 
Nearly all of these are mentioned at least once 
by the institutions replying to the questionnaire. 
Those mentioned most frequently are relatively 
simple and inexpensive materials, suchas the 
Gates-Peardon Practice Exercises in Reading, 
the McCall-Crabbs Test Lessons, Strang’s Study 
Type of Reading Exercises, and the SRA Better 
Reading Books; and periodicals such as the Read- 
er’s Digest. 


Appraisal and Follow-Up 


Principle 15: In every reading program, im- 
provement resulting from reading instruction 
should be appraised by means of several dif- 
ferent procedures. 


In response to a question concerning proced- 
ures used in appraising improvement, 91 percent 
of the institutions indicate that they use results 
of reading tests, 78 percent the reports of class- 
room teachers, 61 percent the judgment of re- 
medial teachers, and 52 percent changes in school 
grades. The last is an especially severe criter- 
ion, and a difficult one to apply because of the 
number of variables that influence school grades. 
It is noteworthy that more than half the institu- 
tions say they make use of it. 

Many other means of appraisal were mention- 
ed. Among those listed several times were par- 
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ents’ judgment of improvement, the judgment of 
the students themselves, changes in pupils’ atti- 
tudes toward reading, and improvement in per- 

sonality and adjustment to the school. 


Principle 16: In order to evaluate the program 
thoroughly and to provide continued help for 
individual pupils as needed, provision should 
be made for regular and systematic follow-up 
after the termination of special instruction. 


Approximately one-half the public schools, in- 
dependent schools, and clinics and about one- 
fourth of the colleges say that they do follow up 
individuals regularly and systematically after a 
period of instruction. The percentage of affirm- 
ative replies may be a reflection of bias in the 
sample. If as many as half the schools in the 
country that have reading programs follow up 
their students after they are released from train- 
ing, they have been very reticent about reporting 
the results. With a few exceptions, reports of 
systematic follow-up of remedial pupils are al- 
most non-existent in the literature on research 
in reading. 


Areas Other than Reading 


Principle 17: Schools have an obligation to pro- 
vide special help for pupils having difficulty 
not only with reading but with all basic skills. 


In order to obtain information on corrective 
and remedial instruction in the fields other than 
reading, the following item was included near the 
end of the questionnaire: ‘‘If you have a regular 
program of corrective or remedial instruction in 
areas other than reading, please check the areas 
and indicate the grades included.’’ Space was 
provided for checking arithmetic, spelling, and 
language usage with regard to group corrective 
or individual remedial work and also for specifying 
other areas in which corrective or remedial help 
was given. 

More than 60 percent of the public schools, 
colleges, and clinics and half the independent 
schools failed to check group corrective work. 
Likewise, more than 60 percent of the public 
schools, colleges, and clinics, and more than 
50 percent of the independent schools failed to 
check the individual remedial work. It seems 
clear that neither group corrective instruction 
nor individual remedial instruction is a common 
practice in areas other than reading for the schools 
included in the study. 

Except among the colleges, special help is 
more often given in spelling than in the other bas- 
ic skills, reading excluded. Approximately a 
fourth of the public schools, a third of the clinics, 
and two-fifths of the independent schools say that 
they have group corrective work in spelling. The 
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colleges favor corrective instruction in language 
usage (presumably including spelling) where more 
than one-third of the colleges offer special help. 
From a fifth to a third of the schools and clinics 
offer group corrective help in arithmetic, while, 
as one would expect, the number of colleges giv- 
ing special attention to this area is very small. 

The trends in the checking to indicate individ- 
ual remedial work in spelling, language usage, 
and arithmetic are similar to those for groupcor- 
rective work, although the percentages tend tobe 
smaller. 

Aside from the three areas listed for check- 
ing on the questionnaire, the area mentioned most 
frequently as one in which corrective and remed- 
ial instruction is given was speech. 


Summary 


In summary, this survey indicates that a siz- 
able number of schools represented in the NAR T 
membership have undertaken developmental read- 
ing programs for their pupils; that a large ma- 
jority provide corrective and remedial instruc- 
tion for pupils retarded in reading; that most of 
the schools have organized corrective classes in 
the regular school day; that, as a rule, less than 
10 percent of the pupils are in the remedial pro- 
gram; that the great majority of the schools ad- 
minister standardized reading tests regularly to 
their pupils; that nearly all these institutions em- 
ploy tests of mental ability along with reading 
tests; that more than one-half the schools and 
colleges cooperating in the study have one or more 
teachers giving full time to remedial instruction; that 
specialists such as psychologists, psychiatrists, 
eye specialists and hearing specialists are avail- 
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able to the majority of the schools but that a com- 
paratively small proportion have such special- 
ists on their regular staffs; that it is common 
practice for institutions offering specia) help in 
reading to check on contributing factors, such as 
handedness, eyedness, visual acuity, auditory 
acuity, speech defects, visual abnormalities, lim- 
ited vocabulary, spelling disability, and person- 
ality difficulty; that there is much difference of 
opinion and uncertainty concerning the influence 
of personality deviation upon reading disability; 
that, with the exception of the tachistoscope, spec- 
ific kinds of mechanical devices for use in diag- 
nosis and instruction are to be found in a minor- 
ity of the institutions replying to the questionnaire; 
that a wide variety of teaching procedures is used 
with pupils retarded in reading; that the major- 
ity of the cooperating institutions obtain materials 
for the remedial program through the use of pub- 
lished books, workbooks, and textbooks, and 
through the preparation of their own materials; 
that improvement resulting from reading instruc- 
tion is appraised by means of several procedures, 
including especially results of tests, reports of 
classroom teachers, judgment of remedial tea- 
chers, and changes in school grades; that about 
half the schools and a fourth of the colleges say 
that they follow up individuals regularly and sys- 
tematically after a period of instruction; that 
tematically after a period of instruction; that re- 
medial and corrective instruction in the basic 
skills other than reading is not especially com- 
mon practice among these schools although acon- 
siderable number of the schools do offer remed- 
ial or corrective instruction in spelling, language 
usage, or arithmetic, and some of the schools 
make special provision for the correction of 
speech difficulties. 
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PROCEDURES FOR COMPUTATION OF ZERO- 
ORDER COEFFICIENTS AMONG 
SEVERAL VARIABLES 


J. FRANCIS RUMMEL 
University of Oregon 
Eugene, Oregon 


THE COMPUTATION of zero-order cor- 
relation coefficients! among several variables 
cannot ordinarily be done by one who has not had 
considerable statistical training. The purpose 
of this paper is to present, in four steps, compu- 
tational procedures simplified to the extent that 
anyone familiar with the use of electrical calcu- 
lators can readily carry out the computation nec- 
essary. 

The illustrative tables are presented in two 
parts, —notation and example. The notation part 
illustrates the statistical processes, in general, 
for the benefit of the reader having a statistical 
background. The example part presents the ac- 
tual numerical values obtained in following each 
step of the procedures. 

The writer has found that clerks without any 
Statistical background could follow the proced- 
ures suggested and could accurately compute 
zero-order correlation coefficients among sev- 
eral variables. 


STEP 1 


Obtain the Sums, Sums of Squares, and Sums of 
Products for all Variables on the IBM Tabu- 
lator Sheets 


In this procedure it is assumed that the raw 
data have been compiled on IBM Tabulator sheets 
from which the sums, sums of squares and 
sums of products are obtainable. An example of 
the usual form of presentation of these values on 
IBM Tabulator sheets is presented in Table I. 


A. To Obtain Sums of Squares 


These are the figures in the last row under 
each variable. The sums of scores may most 
easily be obtained by using only the last IBM Tab- 
ulator sheet conta ning all the variables. In the 
example shown in Table I, the sums of scores 


are given at the end of each column of figures as 
follows: 


Variable Sum 


5361 
3324 
3263 


. To Obtain Sums of Squares and Sums 
of Products 


. Draw a red line under the figures in the row 
numbered 1 of each column. 


. Using a calculator, add all figures in the up- 
per set of the first column. Do not include the 
figure under the red line. 


. Leaving the above sum in the calculator, move 
the carriage to the right one space and addall 
figures in the lower set of the first column. 
Do not include the figure under the red line. 


. The resulting sum is then the sum of Squares 
for the first variable, or the product of vari- 
able (1) and variable (1). 


! Repeat these operations for each of the other 
variables on all sheets. On sheet two, the 
resulting sum of the first column is the sum 
of the products of variable (2) and variable (1). 
The resulting sum of the second column is the 
sum of squares of variable (2). The various 
products in the example are as follows: 


Variables 


(1) x (1) 
(2) « (1) 
(2) x (2) 
(3) « (1) 


Products 


399, 147 
238, 830 
188, 410 
234,527 


1. For the statistical bases see, Kossack, Carl F. "On the Computetion of Zero-Order 


Correlation Coefficients," Psychometrika, XIII (1948), pp. 91-93. 


See also, Ad- 


kins, Dorothy C. "Note on the Computation of Product-Moment Correlation Coeffic- 
ients,” Psychometrika, XIV (1949), pp. 69-75. 


(1) 
(2) 
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TABLE I 


SAMPLE OF RAW DATA OBTAINABLE FROM IBM 
TABULATOR SHEETS 
(3-Variable Example) 


Sheet 1 Sheet 2 Sheet 3 


Variable 1 Variable 2 Variable 3 


(1) (1) (2) (1) (3) 


384 408 205 415 355 
1026 954 569 874 817 
1391 1403 768 1511 1153 
2067 1980 1116 2264 1619 
2912 2878 1626 2773 1849 
3296 3184 1836 3114 2039 
3495 3342 1902 2799 2359 
4205 4088 2294 4270 2683 
4571 - | - 4793 2868 4897 3053 
5361 5361 3324 5361 3263 


0 


1216 159 181 159 190 
2487 498 506 337 361 
3598 1111 1101 868 890 
4244 1832 1742 1307 1343 
4888 2249 2110 2213 2016 
5155 2875 2521 3096 2574 
5302 3719 2938 3768 2883 
5329 4351 3157 4401 3117 
5361 4786 3267 4902 3226 
5361 5361 3324 5361 3263 


UID-10 © 


: The figures in the last row of each group are the same for 
each variable, i.e., for (1), (2), and (3). These figures 
are the sums of the raw scores for the three variables. 


There are as many sheets of data as there are variables 
to be intercorrelated. 
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TABLE II 


THE SUMMATION MATRIX 


Notation 


Columns 


Variables 


1 


=x, 


=X, 


=(X,?) 


=(X2X,) 


=(X2?) 


=(X3X,) 


D(X3X2) 


Variables 


5361 


399, 147 


238, 830 


188, 410 


234, 527 


170,371 


181, 927 
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(3) x (2) 
(3) x (3) 


170,371 
181, 927 


STEP Ill 
Prepare a Summation Work Sheet 
A. Summation Matrix 


Make up a square matrix similar to the one 
shown in Table II. There should be two columns 
and two rows more than the number of variables 
to be correlated. The two rows at the topof the 
matrix and the two columns on the left side are 
the border cells. 


B. Border Cells 


In the outside borders write in the numbers 
or the names of the variables. In the inside bor- 
ders write in the sums of scores for the corres- 
ponding variables as found in Step IA. 


C. Body Cells 


Fill in the sums of products found in Step IB5 
in the body of the table in the cell at the intersec- 
tion of the corresponding variables. For exam- 
ple: In the cell at the intersection of the first 
row and the first column, write in the product of 
variables (1) and (1). in the cell at the intersec- 
tion of the third row and the second column, write 
in the product of variables (3) and (2). Since 
the matrix is symmetrical, only the diagonal cells, 
and the cells below the diagonal need to be filled 
in. 


STEP Ill 
Prepare a Variance Matrix 
A. Variance Matrix 


Make up a square matrix similar to the prev- 
ious one as shown in Table III. 


B. Computation of Values in Body Cells 
(Variances) 


Write in the variances of all combinations of 
variables in the cells located at the intersection 
of the corresponding variables. These are com- 
puted as follows: 

1. For the cell at the intersection of the first 
row and the first column. Using a calculator, 
multiply the number (399, 147), in the correspond 
ing cell of Table II, by N (78), the number of in- 
dividuals in the sample. Leaving this number in 
the calculator, set up the number in the border 
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cell of the first row (5361) and negatively multi- 
ply by the number in the border cell of the first 
column (5361). Enter the resulting value in the 
lower dial of the calculator into this cell. In the 
example, this value is 2,393, 145. 

2. For the cell at the intersection of the third 
row and the second column. Multiply the num- 
ber (170,371), in the corresponding cell of Table 
II, by N (78), and then negatively multiply the 
corresponding row and column border numbers 
(3263 x 3324). This value (2, 442, 726) is to be 
entered in the corresponding cell of Table III. 

3. Repeat these operations for all cells onand 
below the diagonal in Table III. 


C. Computation of Values in Border Cells 


Obtain the reciprocal of the square root of each 
of the diagonal values given in Table III, and write 
them in the inside border cells. These values 
will be the same for the row border cells as for 
the column border cells, i.e., the value for the 
border cell of the first row will be the same as 
the value for the border cell of the first column, 
and so on. 


1. Compute the value for the border cell of 
the first row as follows: Extract the square root 
of the number in the cell located at the intersec- 
tion of the first row and the first column (2, 393, 
145). Carry out the extraction of square roots 
to six figures. The obtained square root for this 
value is 1546.98. Then, divide 1 by the square 
root obtained. Carry out divisions to five digits 
beyond the zeros and round off the last digit. This _ 
value is . 00064642. Enter the value obtained in 
the proper cell; i.¢., at the intersection of the 
first row and the first column. 

2. Compute the values for the border cells in 
the other rows by using the corresponding diag- 
onal values in Table III. 

3. Fill in the border cells for the columns by 
copying the numbers in the corresponding border 
cells for the rows. 


STEP IV 


Prepare a Correlation Matrix 


A. Correlation Matrix 


Make up a square matrix similar to the prev- 
ious one with one less border row and border 
column, as shown in Table IV. 


B. Computation of Intercorrelations 


Write in the intercorrelations of all variables 
in the cells located at the intersections of the 
corresponding variables. These values are com- 
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TABLE IV 


THE CORRELATION MATRIX 


Notation 


Variables 


Variables 


puted as follows: 

1. Using the values in Table III, multiply a 
cell value by the the product of the number in the 
border cell of the corresponding row and the num- 
ber in the border cell of the corresponding col- 
umn. In computing the correlation coefficient to 
be entered in the cell located at the intersection 
of the second row and the first column, the first 
operation is to multiply the number in the border 
cell of the second row (. 000, 523,638) by the num- 
ber in the border cell of the first column (. 000, 
646,421), using only the digits to the right of the 
zeros. Then, multiply the obtained product by 


the number in the cell located at the intersection 
of the second row and the first column (2, 393, 
145). Set the decimal point so that there are the 
same number of digits to the right of it as the 
total number of digits, including zeros, inthe 


two border numbers. This is the correlation co- 
efficient to be entered in the corresponding cell 
(row 2 and column 1). 

2. Compute the values for the other correla 
tion coefficients in the same manner. If there 
are no computational errors, the values obtain- 
ed for the diagonal cells will be approximately 
unity (1. 00000 or . 99999). 
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